Recently Kaggle hosted a competition on the CIFAR-10 dataset. The CIFAR-10 dataset consists of 60k 32x32 colour images in 10 classes. This dataset was collected by Alex
Krizhevsky, Vinod Nair, and Geoffrey Hinton.
Many contestants used convolutional nets to tackle this competition. Some resulted in scores that beat human performance on this classification task. In this blog series we will interview three contestants and also a founding father of convolutional nets: Yann LeCun.
Yann Lecun is currently Director of AI Research at Facebook and a professor at NYU.
Certainly, Kunihiko Fukushima‘s work on the Neo-Cognitron was an inspiration. Although the early forms of convnets owed little to the Neo-Cognitron, the version we settled on (with pooling layers) did.
Not really. It was the logical thing to do. I had been playing with multi-layer nets with local connections since 1982 or so (not having the right learning algorithm though. Backprop didn‘t exist). I started experimenting with shared weight nets while I was a postdoc in Toronto in 1988.
The reason for not trying earlier is simply that I didn‘t have the software nor the data. Once I arrived at Bell Labs, I had access to a large dataset and fast computers (for the time). So I could try full-size convnets, and it worked amazingly well (though it required 2 weeks of training).
Yes. I knew it had to happen. It was a matter of time until the datasets become large enough and the computers powerful enough for deep learning algorithms to become better than human engineers at designing vision systems.
There was a symposium entitled "frontiers in computer vision" at MIT in August 2011. The title of my talk was “5 years from now, everyone will learn their features (you might as well start now)”. David Lowe (the inventor ofSIFT) said the same thing.
Still, I was surprised by how fast the revolution happened and how much better convnets are, compared to other approaches. I would have expected the transition to be more gradual. Also, I would have expected unsupervised learning to play a greater role.
We had to implement our own program language and write our own compiler to build this. Leon Bottou and I had written a neural net simulator called SN, back in 1987/1988. It was a Lisp interpreter with a numerical library (multidimensional arrays, neural net graphs…). We used this at Bell Labs to develop the first convnets.
Then in the early 90’s, we wanted to use our code in products. Initially, we hired a team of developers to convert our Lisp code to C/C++. But the resulting system could not be improved easily (it wasn’t a good platform for R&D). So Leon, Patrice Simard and I wrote a compiler for SN, which we used to develop the next generation OCR engine.
That system integrated a segmenter, a convnet, and a graphical model on top. The whole thing was trained end to end.
The graphical model was called a “graph transformer network”. It was conceptually similar to what we now call a conditional random field, or a structured perceptron (which it predates), but it allowed for non-linear scoring function (CRF and structured perceptrons can only have linear scoring functions).
The whole infrastructure was written in SN and compiled. This is the system that was deployed in ATM machines and check reading machines in 1996 and was reading 10 to 20% of all the checks in the US by the late 90’s.
In my experience, the best large-scale learning systems always take 2 or 3 weeks to train, regardless of the task, the method, the hardware, or the data.
I don’t know if convnets are “pretty slow”. Compared to what? They may be slow to train, but the alternative to “slow learning” is months of engineering efforts which doesn’t work as well in the end. Also, convnets are actually pretty fast to run (after training).
In a real application, no one really cares how long it takes to train. But people care a lot about how long it takes to run.
There are lots and lots of ideas surrounding convnets and deep learning that have lived in relative obscurity for the last 20 years or so. No ones cared about it, and getting papers published was always a struggle. So, lots of ideas were never properly tried, never published, or were tried and published but soundly ignored and quickly forgotten. Who remembers that the first learning-based face detector that actually worked was a convolutional net (back in 1993, eight years before Viola-Jones)?
Today, it’s really amazing to see so many young and bright people devoting so much creative energy to the topic and coming up with new ideas and new applications. The hardware / software infrastructure is getting better, and it’s becoming possible to train large networks in a few hours or a few days. So people can explore many more ideas that in the past.
One thing I’m excited about is the idea of “spectral convolutional net”. This was a paper at ICLR 2014 by folks from my NYU lab about a generalization of convolutional nets that can be applied to any graphs (regular convnets can be applied to 1D, 2D or 3D arrays that can be seen as regular grids in terms of graph). There are practical issues, but it opens the door to many more applications of convnets to unstructured data.
I’m very excited about the application of convnets (and recurrent nets) to natural language understanding (following the seminal work of Collobert and Weston).
It’s a solved problem in the same sense as MNIST is a solved problem. But frankly, people are more interested inImageNet than in CIFAR-10 nowadays. In that sense, CIFAR-10 is not a “real” problem. But it’s not a bad benchmark for a new algorithm.
What are you talking about? Convnets are absolutely everywhere now (or about to be everywhere) in industry:Facebook, Google, Microsoft, IBM, Baidu, NEC, Twitter, Yahoo!….
That said, it’s true that all of these companies have significant R&D resources and that training convnets can still be challenging for smaller companies or companies that are less technically advanced.
It still requires quite of bit of experience and time investment to train a convnet if you don’t have prior training. Soon however, there will be several simple to use open source packages with efficient back-ends for that.
I don’t think it’s a good test. ImageNet is a much better test.
Yes, deep nets need to be deep. Try to train a shallow net to emulate a deep convnet trained on ImageNet. Come back when you have done that. In theory, a deep net can be approximated by a shallow one. But on complex tasks, the shallow net will have to be be ridiculously large.
Hey, I’ve been in academia since 2003, and I’m still a part-time professor at NYU. I do theory when it helps me understand things. Theory often help us understand what’s possible and what’s not possible. It helps suggest proper ways to do things.
But sometimes theory restricts our thinking. Some people will not work with some models because the theory about them is too difficult. But often, a technique works well before the reasons for it working well are fully understood theoretically.
By restricting yourself to work on stuff you fully understand theoretically, you are condemned to using conceptually simple methods.
Also, sometimes theory blinds us. For example, some people were dazzled by kernel methods because of the cute math that goes with it. But, as I’ve said in the past, in the end, kernel machines are shallow networks that perform “glorified template matching”. There is nothing wrong with that (SVM is a great method), but it has dire limitations that we should all be aware of.
I don’t think their is a choice to make between performance and theory. If there is performance, there will be theory to explain it.
Also, what kind of theory are we talking about? Is it a generalization bound? Convnets have a finite VC dimension, hence they are consistent and admit the classical VC bounds. What more do you want? Do you want a tighter bound, like what you get for SVMs? No theoretical bound that I know of is tight enough to be useful in practice. So I really don’t understand the point. Sure, generic VC bounds are atrociously non tight, but non-generic bounds (like for SVMs) are only slightly less atrociously non tight.
If what you desire are convergence proofs (or guarantees), that’s a little more complicated. The loss function of multi-layer nets is non convex, so the easy proofs that assume convexity are out the window. But we all know that in practice, a convnet will almost always converge to the same level of performance, regardless of the starting point (if the initialization is done properly). There is theoretical evidence that there are lots and lots of equivalent local minima and a very small number of “bad” local minima. Hence convergence is rarely a problem.
AI hype is extremely dangerous. It killed various approaches to AI at least 4 times in the past. I keep calling out hype whenever I see it, whether it’s from the press, from startups looking for investors, from large companies looking for PR, or from academics looking for grants.
There is certainly quite a bit of hype around deep learning at the moment. I don’t see a particularly high level of hype around convnets specifically. There is more hype around “cortical this”, “spiking that”, and “neuromorphic blah”. Unlike many of these things, convnets actually yield good results on useful tasks and are widely deployed in industrial applications.
DeepFace: a convnet for face recognition. There are also convnets for image tagging. They are big.
I’m a 3, with a bit of 1 and 4.
I’m impressed by how much creativity and engineering knack went into this. It’s nice that people have pushed the technique as far as it will go on this dataset.
But it’s going to get easier and easier for independent researchers and hobbyist to play with these things and apply them to larger datasets. I think the successor to CIFAR-10 should be ImageNet-1K-128x128. This would be a version of the 1000 category ImageNet classification task where the images have been normalized to 128x128. I see several advantages:
There are tasks like video understanding and natural language understanding where we are going to have to use unsupervised learning. But these modalities have a temporal dimension that changes how we can approach the problem.
Clearly, we need to devise algorithms that can learn the structure of the perceptual world without being told the name of everything. Many of us have been working on this for years (if not decades), but none of us has a perfect solution.
There are two answers to this question:
A lot of (1) is at NYU and a lot of (2) is at Facebook.
The general areas are:
unsupervised learning that discovers “invariant” features, the marriage of deep learning and structured prediction, the unification of supervised and unsupervised learning, solving the problem of learning long-term dependencies, building learning systems with short-term/scratchpad memory, learning plans and sequences of actions, different ways to optimize functions than to follow the gradient, the integration of representation learning with reasoning (read Leon Bottou’s excellent position paper “from machine learning to machine reasoning”), the use of learning to perform inference efficiently, and many other topics.
from: http://blog.kaggle.com/2014/12/22/convolutional-nets-and-cifar-10-an-interview-with-yan-lecun/
卷积神经网络和CIFAR-10:Yann LeCun专访 Convolutional Nets and CIFAR-10: An Interview with Yann LeCun
原文:http://www.cnblogs.com/GarfieldEr007/p/5338332.html