Zero shot learning: Approach that can change Machine Learning


Reading time: 30 minutes

In this article, we are going to discuss about a very unique way of machine learning to classify multiple labels and use them to build a semantic understanding of them to understand examples that doesn't belong to any of the labels and something we have not seen before (mostly the machine has not).

This kind of approach is much more helpful in the field of computer vision and NLP because there exists thousands of categories and to train another thousands for each of them for a simple task of labelling objects and texts is too much in terms of computation and cost.

What is Zero-Shot?

  • Machine learning works on data. It learns patterns and features about the training classes and then when we give a test sample it labels it with a class it is most similar to the features learnt by our algorithm/ model.

  • For training and testing, we need data and the number of samples of a class you need in your data for the machine to learn about it are shots for that class.

  • Now, in zero-shot the machine is capable of describing what class an unlabeled sample belongs to when it does not fall into the category of any of the trained categories. i.e. Zero shots for the datapoint.

Idea behind Zero-Shot Learning

zsl_idea

As we can see, Dog and Cat don't belong to the training samples but there exists a semantic embedding space and that helps the model to describe what the dog picture most closely relates to.

  • We humans have the advantage of knowledge base which we obtain from learning about words. Let's say if I ask you about Elephant what comes to your mind, a big grey animal with small tail and trunk. Giraffe, a yellow long necked animal with stripes.

  • No matter how many different angles and different kinds of photo I give you of Giraffe when you don't know about Giraffe you'll be able to say it's Giraffe because you can easily identify that it's an animal, it has yellow color with stripes and has long neck.

okapi

We can identify 'Okapi' which is an animal that has zebra stripes and face like that of a deer if we know zebra, deer and this knowledge about 'Okapi'.

  • These features form a knowledge base and cross-relate to each other to form many other objects of the world. We humans can identify more than 30,000 objects without sitting idle and look at images to learn about all of them.

  • In case of identifying animals, we will need to provide word embeddings instead of the training labels to the model. So, based on the word-embeddings they'll learn out of the image it'll give out closest word-embedding to our not known Giraffe image and that will turn out to be Giraffe from our dictionary.

  • Zero-shot learning works on this idea and tries to mimick it's learning as close possible to the human's way of learning.

Various approaches with ZSL

In An embarrassingly simple approach to ZSL, they address the problem of automatic classification and they describe ZSL as a two-stage process of Training and Inference where knowledge about attributes is captured in the training stage and knowledge is utilized to categorize instances in a new set of classes in Inference stage.

The inference stage employs 1-nearest neighbor , probabilistic frameworks or similar modified versions.

ezsl

  • The article "On zero-shot recognition of generic objects" talks about flaws in the standard generic object ZSL benchmark and proposed a new benchmark to address these flaws.

  • So, the previous ZSL models can re-evaluated and it could help restore sound ideas thrown away.

  • Important work is done in order to obtain data-sets for ZSL and it impacts in the building of semantic knowledge.

It has major potential to work with generative models as seen in
"Zero-Shot Learning via Simultaneous Generating and Learning" and provide a better strategy where they're training their model iteratively to generate unseen samples and use them as training datapoints to gradually update model parameters.

  • Their method learns about the conditional distribution of seen and unseen classes.

  • They treat missing datapoints as variables and optimize them like model parameters.

Dealing with Conventional methods by ZSL

  • The problems with classical approach is when new set of categories appear after learning we would have to add it to our model for training.

  • Secondly, we are always dependent on labelled data. Most of the time it requires experts to add those multiple annotations to a single image because two categories can be identified by their knowledge. For instance, when talking about Royal Bengal Tiger or White Peacock or American crow all these combinations would be too many.

  • Thirdly, in supervised learning there's no understanding of wisdom. This can be easily proven by stating that we can easily identify a new object by having its description or using similarities of previously learned objects without actually needing data for those new objects.

With this, you will have the complete knowledge of Zero Shot learning in Machine Learning which is a revolutionary idea in the field. Enjoy šŸŽ‰