At NeurIPS 2018 last week in Montréal, two of the four Best Paper Awards went to Canadian research teams. In an exclusive interview with Synced at NeurIPS, members of the University of Toronto and Vector Institute team led by Assistant Professor David Duvenaud discussed their winning submission Neural Ordinary Differential Equations — a math-based approach to designing deep learning models that is stimulating discussion across the machine learning community.
The paper parameterizes the continuous dynamics of hidden units using an ordinary differential equation (ODE) specified by a neural network and develops a new family of deep neural network models for time-series modeling, supervised learning, and density estimation.
“Math is Forever”
Did you ever expect a best paper award?
David Duvenaud: I think I mentioned before we submitted it, ‘I’m really excited about this paper…’ but you always love your own work more than other people, so you can’t really trust your own judgement. So, I liked it, but the big surprise is that other people like it, too. I knew it was going to be at least new, and would be accepted. As for at what level of acceptance — poster work, or spotlight, or oral — it’s hard to tell, and it’s always kind of a surprise.
Jesse Bettencourt: Yeah, I don’t think we really expected it. I knew (Duvenaud) was excited about it like, ‘guys we might win with this paper…’ but that was kind of like a joke. We were just really excited about the work, and we also had the feeling that this was an idea whose time had come in a big way, and if we didn’t do this work, somebody else would. It’s a good time for the work and we were expecting maybe other people doing related work might see this and be excited about it. I think one thing with ‘best paper’ awards is it’s almost like the conference is saying ‘we all should be interested in this work…’
DD: Yes, I don’t think the papers that win best paper generally are much better than the other good papers. Usually best papers have a bit of something for everybody — something that’s new and exciting, and some well-respected method from the past, and also maybe some new math. There’s a lot of really good empirical work that happens in the deep learning world, people are proposing new architectures and training methods. But it’s not always clear which ones will stand the test of time. Reviewers really like to see some math in a paper, because math is forever.
Motivation and Future Applications
What was your original motivation for the work?
DD: There were three main motivations. One was better density models. The second was potentially to build neural networks that required less computation once they had been trained. However, in the paper that we wrote, the models are actually taking more computation right now. But we suspect we’ll be able to regularize our neural networks to require less computation. It’s an open question on how to do this, and this is what I’m working on with Jesse right now. So there is potential to have faster neural networks.
The third area I’m excited about is time series models — in particular for data that comes at irregular times, like patient health records data. The project started when Yulia, the second author, was working with me on a cancer genomics problem. Data was coming in based on when measurements were made on patients. So sometimes there would be a month where there was a bunch of measurements, but then you don’t see some of them for a year… The powerful time series models people currently use are recurrent neural networks that require you to choose fixed intervals to input your data. So maybe every day or every month you have to have a data point — and if you don’t, it’s still possible to handle it… but it’s a little awkward.
I thought we must be able to define a continuous time model. I also knew that the probabilistic programming library from Columbia University actually had implemented training ODE [ordinary differential equation] models, so I knew it was possible in principle. When we looked into it, we found that they were using training methods that didn’t scale as well as the standard training methods in deep learning, which is reverse mode, automatic differentiation. So the remaining technical work to get these models running was to go back to some of the work already done, and translate that into the modern tools that we have for building large models.
How do you see your paper’s potential impact?
DD: I have to say it’s just a proof of concept, and almost all of the experiments in this paper were toy experiments just to show that we could use these methods, and to show off all the new stuff that you can do with this different view of computation. There has been one follow-up work that we submitted to ICLR, FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models… to build a new class of generative density models that have some computational advantages over the old.
Duvenaud and his students Ricky (Tianqi) Chen, Yulia Rubanova and Jesse Bettencourt co-authored Neural Ordinary Differential Equations, which is on arXiv.
Journalist: Fangyu Cai | Editor: Micheal Sarazen
Didn’t know that David knows math!
Not sure he does.
It’s fun to see claiming that they misused math in just an year. https://www.reddit.com/r/MachineLearning/comments/eayp99/r_neuips_2019_david_duvenaud_bullsht_that_i_and/