Training an algorithm to recognize images (or to do anything for that matter) is difficult and occasionally overwhelming. But this recent piece from the MIT Technology Review suggests we’re making it harder than it has to be, using neutral networks that are ten, or even a hundred, times too big – requiring far more time and computational power than is needed.
The team started with two assumptions: the larger a network is, the more layers and nodes it has, the greater the likelihood that some subset of it will be trainable, and that this is why networks start off larger than they need to be. It does seemingly sound efficient to start this way – train a large network, prune it later.
This team wanted to try pruning it earlier, though. From this came the lottery ticket hypothesis:
When you randomly initialize a neural network’s connection strengths, it’s almost like buying a bag of lottery tickets. Within your bag, you hope, is a winning ticket—i.e., an initial configuration that will be easy to train and result in a successful model.
So, using this theory, starting with a very large network is really just like buying more lottery tickets. And with training and pruning, they were able to reduce the original network to 10% and 20% of its original size. In follow-up, other researchers discovered that once a winning configuration was found, it achieved significantly better performance than the untrained original oversize network before any training at all.
What does this mean? Small networks could be trained on laptops or even mobile phones, rather than requiring massive computing power from the cloud. This improves speed as well as security – the latter in particular could transform a few industries, from government to medical data. We look forward to the day that we can do this kind of powerful analysis from a device in the palm of our hands!