BFLOAT16 (BFP16 / BF16) data format

BFLOAT16 (BFP16) is known as Brain Floating Point 16 bits is a representation of floating point numbers with use in accelerating Machine Learning Inference performance and near sensor computing. It was developed by researchers at Google Brain for use in TensorFlow and TPU (Tensor Processing Unit).

Table of content:

BFLOAT16 data format
Sample calculation of BF16
Advantages and Use of BFLOAT16
Alternatives to BFLOAT16
Research done on BFLOAT16

BFLOAT16 data format

BFLOAT16 uses the following format to represent floating point numbers:

Sign bit: 1 bit
Exponent: 8 bits
Mantissa: 7 bits

This makes the total number of bits to 16 (= 1 + 8 + 7).

In contrast, the usual FP16 data format consists of:

Sign bit: 1 bit
Exponent: 5 bits
Mantissa: 10 bits

Hence, mantissa is reduced in BF16.

This format (BFLOAT16) was first introduced by researchers at Google Brain, Research group at Google. The main focus was the use in Inference.

Let the bits be: b_1 b_2 ... b_15 b_16

Then the decimal equivalent is calculated as:
(-1)^(b_1) x 2^((b_2 b_3 ... b_9) - 127) x (1.(1/b_10 x 1/b_11 x ... x 1/b_16))

where:

b_i is the i-th bit
(b_i ... b_j) represent numbers in binary format which needs to be converted to decimal format.

We have explained this with an example in a later section in this article at OpenGenus.

Special numbers defined in BFLOAT16 are:

Zero
Negative Zero
Infinity
Negative Infinity
NaN

Zero and Negative Zero in BF16:

0000 = 0 00000000 0000000 = 0
8000 = 1 00000000 0000000 = −0

Infinity and Negative Infinity in BF16:

7f80 = 0 11111111 0000000 = infinity
ff80 = 1 11111111 0000000 = −infinity

NaN in BFLOAT16 is when:

All bits in exponent is 1
At least one bit in mantissa is 1

Following is the maximum positive value that can be represented in BF16:

0 11111110 1111111 = (2^8 − 1) × 2^−7 × 2^127 ≈ 3.38953139 × 10^38

This is the minimum positive value that can be represented in BF16:

0 00000001 0000000 = 2^−126 ≈ 1.175494351 × 10^−38

Sample calculation of BF16

Consider this number in BFP16 format: 0 10000000 1001001

We will convert this into decimal format.

sign (A1) = 0
exponent (A2) = 10000000 = 128
mantissa (A3) = 1001001 = (1/2 + 1/16 + 1/128) = 0.5703125

Number = (-1) ^ A1 x 2^(A2-127) x 1.A3
= (-1)^0 x 2^1 x 1.5703125
= 3.140625 (approximate value of pie)

Therefore:

0 10000000 1001001 = 3.140625

Advantages and Use of BFLOAT16

BFLOAT16 data format is used to:

Accelerate Inference performance of Machine Learning models
Accelerate training of Machine Learning models

Note: BFLOAT16 is not suitable for ordinary calculations. Hence, BFLOAT16 cannot be used to replace FP16 or FP32 in ordinary calculations like calculating average.

BFLOAT16 is supported in several software and hardware systems. Some Software Systems include:

TensorFlow by Google
OneDNN by Intel

Hardware systems supporting BFLOAT16 include:

TPU by Google
CooperLake and IceLake Server by Intel

Bfloat16 is used in systolic arrays to accelerate matrix multiplication which is a common operation in Machine Learning models. Matrix multiplication uses BFLOAT16 for multiplication and FP32 data format for accumulation.

Physical size of hardware multiplier is proportional to square of the mantissa. Hence, by reducing the size of mantissa to 7 bits, the size of hardware multiplier is reduced by a factor of 2.

Neural Networks are more sensitive to exponent than mantissa so the size of exponent is BF16 is same as the size of exponent in FP32.

Hence, the advantages of using BFLOAT16 are:

Reduced hardware multiplier size compared to FP16 and FP32.
Accelerated Matrix Multiplication performance.
Reduced memory requirements.

Alternatives to BFLOAT16

Alternatives to BFLOAT16 data format include:

FP32 data format

This is the original data format where all calculations are done. Other data formats are used in order to improve performance compared to using FP32.

FP16 data format

FP16 and BFP16 have the same memory requirements but BFLOAT16 is proved to have advantages specific to Machine Learning Inference performance. The reduction in hardware multiplier and acceleration in Matrix Multiplication that is the advantage of BFP16 is not observed with FP16.

INT8 data format

The use of INT8 data format for Machine Learning Inference is on the rise.

Research done on BFLOAT16

Learn more about BFLOAT16 through these research papers:

"TensorFlow: A system for large-scale machine learning" by researchers at Google Brain. This paper introduced the idea of BFLOAT16.
"A Study of BFLOAT16 for Deep Learning Training" by researchers at Intel Labs and Facebook. This paper explores the use of BFLOAT16 data format for training of Machine Learning models. Previous use-cases involved using BFLOAT16 for Inference.
Cliff Young, Google AI. (October 2018). "Codesign in Google TPUs: Inference, Training, Performance, and Scalability". This was the Keynote speech in "Processor Conference 2018".
Carey Kloss, Intel VP Hardware and AI Products Group. (April 2019). "Deep Learning By Design; Building silicon for AI." This was presented in "Processor Conference 2019".

With this article at OpenGenus, you must have the complete idea of BFLOAT16 data format and its use in Inference process in Machine Learning models.