Ali Rehman

Deploying Machine Learning Models with NVIDIA TensorRT on PC and Jetson Devices

NVIDIA TensorRT is a powerful deep learning inference library that optimizes trained neural networks for high-performance deployment on NVIDIA GPUs. Whether you’re deploying on a PC with an NVIDIA GPU or an edge device like the NVIDIA Jetson platform, TensorRT can significantly boost inference speed and efficiency. In this blog post, I’ll walk you through the process of deploying machine learning models using TensorRT on both PC and Jetson devices, complete with step-by-step instructions and code snippets.

Why TensorRT?

TensorRT optimizes models by performing layer fusion, precision calibration (e.g., FP32 to FP16 or INT8), kernel auto-tuning, and dynamic tensor memory management. This results in faster inference with lower latency and reduced memory footprint, making it ideal for real-time applications like autonomous vehicles, robotics, and IoT devices.

Prerequisites

Before we dive in, ensure you have the following:

  • Hardware:

    • For PC: A system with an NVIDIA GPU (e.g., RTX 3080).

    • For Jetson: An NVIDIA Jetson device (e.g., Jetson Nano, TX1/TX2, Xavier NX, or Orin).

  • Software:

    • NVIDIA driver and CUDA toolkit installed (version compatible with TensorRT).

    • TensorRT installed (download from NVIDIA Developer).

    • Python 3.8+ and dependencies (numpy, onnx, pycuda).

    • A trained machine learning model (e.g., in PyTorch or TensorFlow, converted to ONNX format).

Step 1: Preparing the Model

TensorRT works with models in ONNX or Caffe format, but ONNX is more commonly used due to its framework-agnostic nature. Let’s convert a PyTorch model to ONNX.

Converting PyTorch Model to ONNX

Assume you have a trained PyTorch model (e.g., ResNet-18). Here’s how to export it to ONNX:

 				 					import torch import torchvision.models as models  # Load pre-trained model 
model = models.resnet18(pretrained=True) model.eval()
dummy_input = torch.randn(1, 3, 224, 224) # Define dummy input for export
torch.onnx.export(model,
dummy_input,
"resnet18.onnx",
input_names=["input"],
output_names=["output"],
dynamic_axes={"input": {0: "batch_size"},
"output": {0: "batch_size"}} ) 				 			

Step 2: Installing TensorRT

On PC

Easiest Installation is using python wheel using wheel, create a virtual environment with python version 3.8-3.11.   

 				 					conda create -n trt_env python==3.10 conda activate trt_env 				 			

Install Python Wheel and dependencies

 				 					python3 -m pip install --upgrade pip 
python3 -m pip install wheel 
python3 -m pip install --upgrade tensorrt pycuda  				 			

On Jetson

Jetson devices come with TensorRT pre-installed as part of JetPack SDK. Ensure your Jetson is flashed with the latest JetPack version Verify TensorRT installation:

 				 					dpkg -l | grep tensorrt pip3 show tensorrt 				 			

Step 3: Optimizing the Model with TensorRT

TensorRT converts the ONNX model into an optimized engine file tailored for your GPU. Below is step by step guide.

 				 					git clone https://github.com/ali-rehman-ML/trt_inference.git cd trt_inference 				 			
 				 					pip install -r requirements.txt 				 			
 				 					pip install . 				 			

Engine Creation

Now lets’ create tensorrt engine from onnx model exported in previous step :

 				 					from trt_inference import EngineBuilder  
builder = EngineBuilder(verbose=True, workspace=16) 
builder.create_network("resnet18.onnx") 
builder.create_engine("resnet18.trt", precision="fp32") 				 			

Step 4: Deploying the Model (Inference)

 				 					import numpy as np from trt_inference import TRTInference  
trt_inference = TRTInference("resnet18.trt", max_batch_size=32) 
input_data = np.random.randn(4, 3, 224, 224).astype(np.float32) 
outputs = trt_inference.infer(input_data) 
print(f"Output shapes: {[output.shape for output in outputs]}") 				 			

Performance Evaluation

Scroll to Top