Run a ResNet50 model in ONNX format on TVM Stack with LLVM backend

Reading time: 15 minutes | Coding time: 15 minutes

In this guide, we will run a ResNet50 model in ONNX format on the TVM Stack with LLVM backend. You do not need any specialized equipment like GPU and TPU to follow this guide. A simple CPU is enough.

You need to have TVM Stack installed on your system to follow along. You can do this in 15 minutes by following our TVM installation guide

Step 1: Get the ResNet50 model in ONNX format

We need the pre-trained ResNet50 model in ONNX format. You can train and build your own ResNet50 model from scratch but in this guide, we are using an available model to get started quickly.

We will get the model from the Official ONNX Model Zoo which contains several sample models in ONNX format:


Step 2: Get the input image for inference

We need a sample image to feed to our model:


Step 3: Get the TVM code

In short, we will load the ONNX model (resnet50v1.onnx) and the input image (kitten.jpg). We will convert the ONNX model to NNVM format and compile it using the NNVM compiler. Once done, we will define the backend as LLVM and run the model using the TVM runtime.

Following code is written in Python:

import nnvm
import tvm
import onnx
import numpy as np
import matplotlib
import time
import mxnet as mx
import tensorflow as tf
from tensorflow.core.framework import graph_pb2
from tensorflow.python.framework import dtypes
from tensorflow.python.framework import tensor_util
import os.path
onnx_model = onnx.load_model('resnet50v1.onnx')
sym, params = nnvm.frontend.from_onnx(onnx_model)
from PIL import Image
img ='kitten.jpg').resize((256, 256))
img = img.crop((16, 16, 240, 240))
img = img.convert("YCbCr")  # convert to YCbCr
x = np.array(img)[np.newaxis, :, :, :]
x = normalize(x)
x = x.swapaxes(1,3)
x = x.swapaxes(2,3)
# Compile Model on NNVM compiler
import nnvm.compiler
target = 'llvm'
input_name = sym.list_input_names()[0]
shape_dict = {input_name: x.shape}
with nnvm.compiler.build_config(opt_level=3, 
             add_pass=['AlterOpLayout', 'FoldScaleAxis', 'OpFusion', 
                 'PrecomputePrune', 'SimplifyInference']):
    graph, lib, params = = sym, 
                    target_host = 'llvm', target = target, shape = shape_dict,
                    dtype={"int64": "int64", "float32": "float32"}, 
                    params = params, layout = "NCHW")
# Execute on TVM
from tvm.contrib import graph_runtime
ctx = tvm.cpu(0)
dtype = 'float32'
m = graph_runtime.create(graph, lib, ctx)
# set inputs
m.set_input(input_name, tvm.nd.array(x.astype(dtype)))
start = time.clock()
end = time.clock()
print ("Time taken to successfully execute: ")
print (end-start)

Save the above code as ""

Step 4: Execute the code

To execute the code, use the following command:


The execution time will vary from system to system.