Alternatives to CNN (Convolutional Neural Network)

We have all seen the boom of Convolutional neural networks, but we seldom understand that there is something inherently wrong about CNN. This claim is hard to justify easily but the article will make sure that you understand what exactly is wrong and what are the probable solutions.

There are two major alternatives to CNN (Convolutional Neural Network) namely:

Convolutional Neural Networks or CNNs are one of those concepts that made the developmental acceleration in the field of deep learning. Once the concept of computer vision was penned down, there has been a significant amount of work around that field, more specifically, image classification.

Before we go into the alternatives to CNN, let us first see what is the problem with them.

One of the most prominent figures in Deep learning research, Geoffrey Hinton, said that CNNs are doomed. And the statement although seemingly untenable for the naïve audience or even only moderately experienced people, was backed by a brilliant research. Hinton mentioned two primary points:

They cannot extrapolate their understanding of geometric relationships to radically new viewpoints. What this meant was, the models work well in the use cases where the relationships were based on the already provided structure. But if the relationships are relatively new and sudden, then the model cannot really generalize to that. In this case, the architecture would only work for specific cases.

This was relatively weaker argument. Hinton mentioned something about the problem of memory concerning the spatial relationships between abstracted parts of an image. For example, the spatial relationships between the ears and nose would be horizontal, and that of nose and mouth would be vertical. But these kinds of spatial relationships are not decently covered by the convolutional architectures, when sub-sampled. But this could be handled by overlapping the sub-sampling pools.

Now, let us understand how can we make this better. First it is important to understand that the model architectures can be improved at the core level. Although this topic is more of a less-researched one, there are some theories that might help mitigating these problems and also increase the adaptability of convolutional architectures.

First way

One of the solutions possible, is using a hierarchical graph structure. The recent developments in the field of graph neural networks show promise in these areas. One of the most appealing properties of Graph neural network is the ability to be immutable towards rotation and translation. As the GNNs are based on the concept of edges and vertices, there is no such concept of rotation at all. It directly solves the problem of extrapolating to different orientations.

Second way

The second possible solution is the one proposed by Hinton and his team of researchers. They introduce a concept of Capsules. Capsules were first introduced as an explanation to many problems regarding the theory of neurons while understanding brain. The concept was formalized way early, but this is the first time anyone implemented it to the CNN based architectures. Hinton mainly argued that CNNs are doomed not only because of the lack of generalizations as mentioned above, but the bigger problem was that they were ‘misguided’.

Convolutional architectures were trying to aim for the invariance w.r.t the viewpoint in actions of neurons. However, these just use a single scalar output to summarize the pool of local features repeated over and over. Instead, what they recommended, are capsules. Capsules perform rather complicated internal computations on the inputs, to be able to encapsulate the results better. And they did so, by converting the outputs into a small vector of information outputs.
So, these were the two approaches which are in theory provable to be alternatives to CNN. But these architectures try to make them better by mitigating the limitations.

A simple alternative could be to use any machine learning model after a feature extraction phase.

It could be possible to get the features by using pixel densities or draft saliency graphs, but feature extraction may or may not be as accurate as the CNN architectures. But if the features are well curated then the machine learning models like support vector machines may help in getting the accuracy as good as the convolutional architectures.

Whatever it may be, CNNs have succeeded a lot. But now in further research, it is better to look for some core change than only increasing the depth of the networks. Hinton’s idea of using capsules may help with this. We would only be sure once the theories are solidified by results.