Run a VGG16 model in ONNX format on TVM Stack with LLVM backend

Do not miss this exclusive book on Binary Tree Problems. Get it now for free.

Reading time: 15 minutes | Coding time: 15 minutes

In this guide, we will run a VGG16 model in ONNX format on the TVM Stack with LLVM backend. You do not need any specialized equipment like GPU and TPU to follow this guide. A simple CPU is enough.

You need to have TVM Stack installed on your system to follow along. You can do this in 15 minutes by following our TVM installation guide

Step 1: Get the VGG16 model in ONNX format

We need the pre-trained VGG16 model in ONNX format. You can train and build your own VGG16 model from scratch but in this guide, we are using an available model to get started quickly.

We will get the model from the Official ONNX Model Zoo which contains several sample models in ONNX format:

wget https://s3.amazonaws.com/onnx-model-zoo/vgg/vgg16/vgg16.onnx

Step 2: Get the input image for inference

We need a sample image to feed to our model:

wget https://s3.amazonaws.com/model-server/inputs/kitten.jpg

Step 3: Get the TVM code

In short, we will load the ONNX model (vgg16.onnx) and the input image (kitten.jpg). We will convert the ONNX model to NNVM format and compile it using the NNVM compiler. Once done, we will define the backend as LLVM and run the model using the TVM runtime.

Following code is written in Python:


import nnvm
import tvm
import onnx
import numpy as np
import matplotlib
matplotlib.use('Agg')
import time
import mxnet as mx
import tensorflow as tf
from tensorflow.core.framework import graph_pb2
from tensorflow.python.framework import dtypes
from tensorflow.python.framework import tensor_util
import nnvm.testing.tf
import os.path
onnx_model = onnx.load_model('vgg16.onnx')
sym, params = nnvm.frontend.from_onnx(onnx_model)
from PIL import Image
img = Image.open('kitten.jpg').resize((256, 256))
img = img.crop((16, 16, 240, 240))
img = img.convert("YCbCr")  # convert to YCbCr
x = np.array(img)[np.newaxis, :, :, :]
x = normalize(x)
x = x.swapaxes(1,3)
x = x.swapaxes(2,3)
# Compile Model on NNVM compiler
import nnvm.compiler
target = 'llvm'
input_name = sym.list_input_names()[0]
shape_dict = {input_name: x.shape}
with nnvm.compiler.build_config(opt_level=3, 
               add_pass=['AlterOpLayout', 'FoldScaleAxis', 'OpFusion',
                    'PrecomputePrune', 'SimplifyInference']):
    graph, lib, params = nnvm.compiler.build(graph = sym, 
               target_host = 'llvm', target = target, shape = shape_dict, 
               dtype={"int64": "int64", "float32": "float32"}, 
               params = params, layout = "NCHW")
# Execute on TVM
from tvm.contrib import graph_runtime
ctx = tvm.cpu(0)
dtype = 'float32'
m = graph_runtime.create(graph, lib, ctx)
# set inputs
m.set_input(input_name, tvm.nd.array(x.astype(dtype)))
m.set_input(**params)
start = time.clock()
m.run()
end = time.clock()
print ("Time taken to successfully execute: ")
print (end-start)

Save the above code as "vgg16_onnx.py"

Step 4: Execute the code

To execute the code, use the following command:

python vgg16_onnx.py

The execution time will vary from system to system.