What is Throughput in Machine Learning (ML)?

Throughput is a measurement in Machine Learning to determine the performance of various models for a specific application. Throughput refers to the number of data units processed in one unit of time.

The unit of Throughput is images/ second where image is the data and second is the time unit.

In terms of Image Classification:

Throughput is the number of images processed in one second for batch size > 1.
Batch size is the number of images processed at a time together.

Example of Throughput: GoogleNet for Image Classification classifies 297 images in one second on Intel CascadeLake. This improves to 904 images in one second if we use INT8 version of GoogleNet on the same system for the same application.

Hence, the Throughput is 297 images/ second or 904 images/ second.

Significance of Throughput

Throughput is important as it is directly tied with performance of server for non-real time systems like pre-processing tasks.

More Throughput is better.

Throughput is the number of images processed in one second. If more images are processed in one second, the server can process more images in a given time and hence, gives the server more power.

Improving Throughput is not trivial and requires deep insights into the Machine Learning model at hand and the concerned application. It depends on the Machine Learning framework and the system as well.
One common approach to improve Throughput is to parallize the workload equally across all system resources (like NUMA nodes).

Throughput vs Latency

Throughput and Latency can be used interchangeably.

Throughput is used for server applications which will pre-compute a specific task or process input of multiple users together while Latency is mainly used for applications that will be used directly by customers.
Throughput is for batch size greater than 1 while Latency is for batch size 1.
Throughput is inversely proportional to Latency.
Usually, performance of Machine Learning models tend to improve when batch size is greater than 1 (a power of 2 and depends on the system).
The strategy to improve Throughput differs from the strategy to improve Latency.
Running workload in parallel gives more improvement for throughput than for latency.

Use of Throughput

Throughput is mainly used for applications that use batch size > 1. Consider the following applications:

Detecting someone in Survelliance videos: Processing video is an intensive task and is usually, done as a server task (not as a real time). An example is processing last 30 days Survelliance videos of a city to identify a person. Being an intensive task, running it with batch size > 1 is the preferred approach where throughput comes into play.
High traffic web services: High traffic web services can take requests from multiple users for specific tasks (like Image Compression) and run it with batch size > 1. This improves system performance and utilization with a slight cost on real time performance as output is generated after all data is processed.

With this article at OpenGenus, you must have the complete idea of Throughput in Machine Learning. Enjoy.