Types of Data Formats in Machine Learning

Reading time: 20 minutes

Each data format represents how the input data is represented in memory. This is important as each machine learning application performs well for a particular data format and worse for others. Interchanging between various data formats and chosing the correct format is a major optimization technique. For example, TensorFlow is built around NHWC format while MKLDNN is built around NCHW data format.

There are four types of data formats:

  • NHWC
  • NCHW
  • NCDHW
  • NDHWC

Note that one can work with other data format combinations as well but the above four are the commonly used ones.

General guidance

Each letter in the formats denotes a particular aspect/ dimension of the data:

  • N: Batch size : is the number of images passed together as a group for inference
  • C: Channel : is the number of data components that make a data point for the input data. It is 3 for opaque images and 4 for transparent images.
  • H: Height : is the height/ measurement in y axis of the input data
  • W: Width : is the width/ measurement in x axis of the input data
  • D: Depth : is the depth of the input data

Points to consider

  • Operations like convolutions and gradients operate channel-wise which will benefit if memory is contiguous with respect to pixels within a channel.

  • Languages like Python and library like Numpy use a default memory layout of row-major this means that the 'correct' memory layout is CHW for an image.

  • In case of batches, it makes sense if the memory for an image is contiguous which results in NCHW

  • Column-major is the default layout in Fortan and Matlab in which case, the ordering should be HWC which is the case in most image loading software.

NHWC

NHWC denotes (Batch size, Height, Width, Channel). This means there is a 4D array where the first dimension represents batch size and accordingly. This 4D array is laid out in memory in row major order.

Hence, you can visualize the memory layout to imagine which operations will access consequetive memory (fast) or memory separated by other data (slow).

Commonly used data: Images

Software: TensorFlow

NCHW

NCHW denotes (Batch size, Channel, Height, Width). This means there is a 4D array where the first dimension represents batch size and accordingly. This 4D array is laid out in memory in row major order.

Commonly used data: Images

Software: MKLDNN

NCDHW

NCHW denotes (Batch size, Channel, Depth, Height, Width). This means there is a 5D array where the first dimension represents batch size and accordingly. This 5D array is laid out in memory in row major order.

Commonly used data: Video

Software: MKLDNN

NDHWC

NCHW denotes (Batch size, Depth, Height, Width, Channel). This means there is a 5D array where the first dimension represents batch size and accordingly. This 5D array is laid out in memory in row major order.

Commonly used data: Video

Software: TensorFlow