Undistillable class in Deep Learning

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

In this article at OpenGenus.org, we have explored the concept of Undistillable class in Deep Learning which is one of the biggest challenges in the process of Knowledge Distillation.

Table of contents:

Introduction to Distillation in DL
What is Undistillable class?
Advantages and Disadvantages of Undistillable classes

Introduction to Distillation in DL

Distillation or Knowledge Distillation in Deep Learning is the process of using a complex pre-trained model (known as teacher model) to train a new simpler model (say student model) such that the simplier model is able to perform similar task.

Knowledge Distillation is used to compress models and leverage pre-trained models.

One of the biggest challenges in Knowledge Distillation is Undistillable class.

What is Undistillable class?

Undistillable class is a class in the knowledge of a teacher model that cannot be distilled or transferred to a student model.

One approach to handle this is to identify the undistillable classes and discard them before training the student model using knowledge distillation. This idea has been explored in the NeurIPS paper titled "Teach Less, Learn More: On the Undistillable Classes in Knowledge Distillation" by Yichen Zhu, Ning Liu, Zhiyuan Xu, Xin Liu, Weibin Meng, Louis Wang, Zhicai Ou and Jian Tang.

Following are the reasons why Undistillable classes exist:

Comparing with real life:
- A professor does advanced research and also, conducts class to prepare students to enter the field and participate in research. A competitive student is able to gain maximum knowledge but there remains areas where the Professor still holds control due to few domains which a student can master only through self-study and self-research.
- There is a difference between a good student and a good researcher.
In Deep Learning:
- The real reason is not well known and is in research.
- The wide belief is that specific knowledge or relation in data requires a certain minimum number of parameters in a given Deep Learning model.
- As student model has less number of parameters compared to a teacher model, these classes cannot be learnt. This is known as Capacity Mismatch.

Advantages and Disadvantages of Undistillable classes

Following are the disadvantages of Undistillable class in Deep Learning:

It makes to create a true student model that can replace a teacher model. So, a student model is always < a teacher model.
This makes training optimizations like Knowledge Distillation to be of limited use.
Model Bias: Student model will be more biased towards classes that are easier to learn that is distillable classes.
From a dataset point of view, this results in class imbalance in the dataset. Not all classes/ features are equal.
To handle Undistillable class, the complexity of the model tends to increase.
If such Undistillable classes remain undetected, this leads to loss of information.

Following are the advantages of Undistillable class in Deep Learning:

Security: Any class can be made undistillable during training. The direct advantage is that given a teacher model, one cannot create a student model. This pretends one to copy a given trained model to create their own model.

This application of intentionally making a model undistillable has been explored in the research paper titled "Undistillable: Making A Nasty Teacher That CANNOT teach students" by Haoyu Ma (University of California, Irvine), Tianlong Chen, Ting-Kuei Hu (Texas A&M University) and Chenyu You (Yale University).

With this article at OpenGenus.org, you must have the complete idea of Undistillable class in Deep Learning.

Undistillable class in Deep Learning

Deep Learning

Introduction to Distillation in DL

What is Undistillable class?

Advantages and Disadvantages of Undistillable classes

CSS writing-mode explained

Capacity Mismatch in Deep Learning