Using trtexec to generate an engine file from an ONNX model works error with two RTSP input sources
Issue Overview
Users are experiencing errors when attempting to generate an engine file from an ONNX model using the trtexec
command on the Nvidia Jetson Orin Nano Dev board. The specific issue arises when trying to run applications that utilize multiple RTSP input sources.
Symptoms
- Successful engine file generation with a single RTSP input source.
- Errors occur when two or more RTSP input sources are used, leading to warnings and failures in creating the inference context.
Context
- Environment Specifications:
- TensorRT Version: 8.5
- GPU Type: Jetson Orin Nano (4GB)
- CUDA Version: 11.4
- CUDNN Version: 8.6.0
- Operating System: Ubuntu 20.04
- Python Version: 3.8.10
- Baremetal or Container: Baremetal
Frequency and Impact
The issue appears consistently when attempting to use multiple RTSP streams, severely impacting the user experience by preventing the application from functioning as intended.
Possible Causes
-
Hardware Limitations: The Jetson Orin Nano may not have sufficient resources (e.g., memory or processing power) to handle multiple streams simultaneously.
-
Model Configuration: The ONNX model may have a static max batch size of 1, which conflicts with requests for higher batch sizes when multiple inputs are used.
-
Software Bugs or Conflicts: Potential bugs in the DeepStream SDK or TensorRT that affect multi-stream processing.
-
Driver Issues: Incompatibilities between the installed drivers and the TensorRT version could lead to unexpected behavior.
-
Configuration Errors: Incorrect settings in configuration files (e.g.,
config_face_nvinfer.txt
) that do not match the model’s requirements.
Troubleshooting Steps, Solutions & Fixes
Step-by-Step Instructions
-
Verify Environment Setup:
Ensure that all software components are correctly installed and compatible:- Check TensorRT, CUDA, and CUDNN versions.
- Ensure that DeepStream SDK is properly configured.
-
Modify the ONNX Model:
If the ONNX model has a static batch size of 1, modify it to allow for higher batch sizes:- Install necessary dependencies:
git clone https://github.com/NVIDIA/TensorRT.git cd TensorRT/tools/onnx-graphsurgeon/ make build python3 -m pip install dist/onnx_graphsurgeon-*-py2.py3-none-any.whl pip3 install onnx
- Use a script to modify the batch size:
import onnx import onnx_graphsurgeon as gs batch = 2 graph = gs.import_onnx(onnx.load("face.onnx")) for input in graph.inputs: input.shape[0] = batch reshape_nodes = [node for node in graph.nodes if node.op == "Reshape"] for node in reshape_nodes: node.inputs.values[0] = batch onnx.save(gs.export_onnx(graph), "dynamic.onnx")
- Create a new TensorRT engine:
/usr/src/tensorrt/bin/trtexec --onnx=dynamic.onnx --saveEngine=face1.engine
- Install necessary dependencies:
-
Update Configuration Files:
Modifyconfig_face_nvinfer.txt
to set the correct batch size:batch-size=2
-
Run Tests:
Test with multiple input sources using the updated engine:python3 deepstream_imagedata-multistream.py \ file:///opt/nvidia/deepstream/deepstream-6.3/sources/deepstream_python_apps/apps/deepstream-imagedata-multistream-test/darkface2.mp4 \ file:///opt/nvidia/deepstream/deepstream-6.3/sources/deepstream_python_apps/apps/deepstream-imagedata-multistream-test/darkface2.mp4 frames/
-
Check for Errors:
Monitor logs for any warnings or errors related to engine creation and inference context initialization.
Recommended Fixes
-
Utilize the
--batch
flag while generating the engine file if applicable. -
Ensure that
--optShapes
,--shapes
, and other flags are used correctly according to TensorRT documentation.
Best Practices
-
Regularly update all software components (TensorRT, CUDA, CUDNN) to their latest versions.
-
Test configurations with different models and input scenarios to isolate issues effectively.
Unresolved Aspects
Further investigation may be needed regarding specific model configurations or potential bugs within TensorRT or DeepStream SDK that could lead to these issues when handling multiple streams.