Steps to run BenchDNN of OneDNN

In this article, we have explained step-by-step how to run BenchDNN which is a popular industry benchmark developed by OneDNN team from Intel.

Table of contents:

Introduction to BenchDNN
Steps to use BenchDNN

Introduction to BenchDNN

BenchDNN stands for "Benchmark Deep Neural Networks".

BenchDNN is a part of OneDNN which is an open-source Machine Learning library by Intel. OneDNN is an open-source project and hence, BenchDNN is also open-source. BenchDNN is a benchmarking tool for OneDNN.

It is a part of the One API project of Intel.

BenchDNN is an useful benchmark to profile the performance of various fundamental operations using OneDNN like Convolution, MatMul and much more.

Steps to use BenchDNN

Build OneDNN from source

https://github.com/oneapi-src/oneDNN.git
cd oneDNN
mkdir -p build && cd build && cmake ..
make -j 64
make doc
sudo make install

Run benchmarks from BenchDNN

The benchmark code is available at:

oneDNN/tests/benchdnn/

If you need to make custom changes to the benchmark, you need to do them in the above files.

On building OneDNN, benchDNN will also be built and you can run the tests from:

build/tests/benchdnn

There is a main executable "benchdnn" which can be used to run all benchmark by setting various options. The input data can be provided using sample input files.

cd build/tests/benchdnn

Some features of BenchDNN:

The measurement output is in GFLOPs. Higher the GFLOPs, higher is the performance.
Each operations support multiple precisions like FP32, INT8, BF16 and others.
We can specify the precision for input, output and accumulation. Note that not all but most combinations are supported by OneDNN using --cfg option.

BenchDNN supports the following operations:

binary
bnorm
concat
conv
deconv
eltwise
ip
lnorm
lrn
matmul
pool
prelu
reduction
reorder
resampling
rnn
shuffle
softmax
sum
zeropad

Sample command to run Convolution benchmark from BenchDNN:

./benchdnn --conv --cfg=f32 --dir=FWD_B --batch=inputs/conv/set_conv_all

We can add a post op like ReLU along with Convolution:

./benchdnn --conv --cfg=f32 --dir=FWD_B \
           --attr-post-ops=relu --batch=inputs/conv/set_conv_all

Sample command to run MatMul tests from BenchDNN:

./benchdnn --matmul --mode=p --max-ms-per-prb=6000 \
           --cfg=fp32fp32fp32 --batch=inputs/matmul/shapes_2d

Sample output:

perf,cpu,brg:avx512_core, --matmul --mode=p --cfg \
     100x100:100x100,0.5,0.4,150,0.4,200

Sample command to run pooling:

./benchdnn --pool --batch=inputs/pool/shapes_2d

The files like "inputs/pool/shapes_2d" are simple text files which store the input size on which the tests should run. The content of the file "inputs/pool/shapes_2d" is as follows:

# random problems

# regular
mb1ic8_ih3oh3_kh3ph1
mb2ic128_ih4oh2_kh3
mb2ic96_ih4oh2_kh3
mb2ic64_ih1oh1_kh3ph1
mb2ic4_ih4oh4_kh3ph1
mb2ic32_ih4oh4_kh3
mb2ic32_ih13oh12_kh3
mb16ic64_ih32oh16_kh3sh2
mb4ic16_ih10oh10_kh2ph1
mb64ic64_ih56oh56_kh3ph1

...

Try running BenchDNN for different operations and see how it performs on your system. The performance will vary depending on the input which you can analyze to find good insights.