
Deploying Machine Learning Models with NVIDIA TensorRT on PC and Jetson Devices
NVIDIA TensorRT is a powerful deep learning inference library that optimizes trained neural networks for high-performance deployment on NVIDIA GPUs. Whether you’re deploying on a PC with an NVIDIA GPU or an edge device like the NVIDIA Jetson platform, TensorRT can significantly boost inference speed and efficiency. In this blog post, I’ll walk you through the process of deploying machine learning models using TensorRT on both PC and Jetson devices, complete with step-by-step instructions and code snippets.
Why TensorRT?
TensorRT optimizes models by performing layer fusion, precision calibration (e.g., FP32 to FP16 or INT8), kernel auto-tuning, and dynamic tensor memory management. This results in faster inference with lower latency and reduced memory footprint, making it ideal for real-time applications like autonomous vehicles, robotics, and IoT devices.
Prerequisites
Before we dive in, ensure you have the following:
Hardware:
For PC: A system with an NVIDIA GPU (e.g., RTX 3080).
For Jetson: An NVIDIA Jetson device (e.g., Jetson Nano, TX1/TX2, Xavier NX, or Orin).
Software:
NVIDIA driver and CUDA toolkit installed (version compatible with TensorRT).
TensorRT installed (download from NVIDIA Developer).
Python 3.8+ and dependencies (numpy, onnx, pycuda).
A trained machine learning model (e.g., in PyTorch or TensorFlow, converted to ONNX format).
Step 1: Preparing the Model
TensorRT works with models in ONNX or Caffe format, but ONNX is more commonly used due to its framework-agnostic nature. Let’s convert a PyTorch model to ONNX.
Converting PyTorch Model to ONNX
Assume you have a trained PyTorch model (e.g., ResNet-18). Here’s how to export it to ONNX:
import torch import torchvision.models as models # Load pre-trained model
model = models.resnet18(pretrained=True) model.eval()
dummy_input = torch.randn(1, 3, 224, 224) # Define dummy input for export
torch.onnx.export(model,
dummy_input,
"resnet18.onnx",
input_names=["input"],
output_names=["output"],
dynamic_axes={"input": {0: "batch_size"},
"output": {0: "batch_size"}} )
Step 2: Installing TensorRT
On PC
Easiest Installation is using python wheel using wheel, create a virtual environment with python version 3.8-3.11.
conda create -n trt_env python==3.10 conda activate trt_env
Install Python Wheel and dependencies
python3 -m pip install --upgrade pip
python3 -m pip install wheel
python3 -m pip install --upgrade tensorrt pycuda
On Jetson
Jetson devices come with TensorRT pre-installed as part of JetPack SDK. Ensure your Jetson is flashed with the latest JetPack version Verify TensorRT installation:
dpkg -l | grep tensorrt pip3 show tensorrt
Step 3: Optimizing the Model with TensorRT
TensorRT converts the ONNX model into an optimized engine file tailored for your GPU. Below is step by step guide.
git clone https://github.com/ali-rehman-ML/trt_inference.git cd trt_inference
pip install -r requirements.txt
pip install .
Engine Creation
Now lets’ create tensorrt engine from onnx model exported in previous step :
from trt_inference import EngineBuilder
builder = EngineBuilder(verbose=True, workspace=16)
builder.create_network("resnet18.onnx")
builder.create_engine("resnet18.trt", precision="fp32")
Step 4: Deploying the Model (Inference)
import numpy as np from trt_inference import TRTInference
trt_inference = TRTInference("resnet18.trt", max_batch_size=32)
input_data = np.random.randn(4, 3, 224, 224).astype(np.float32)
outputs = trt_inference.infer(input_data)
print(f"Output shapes: {[output.shape for output in outputs]}")
Performance Evaluation


