Slow FPS on Orin Nano 8 GB – YoloV8
Issue Overview
A user is experiencing slow FPS (frames per second) performance when running a YOLOv8 object detection and tracking application on an NVIDIA Jetson Orin Nano 8 GB development board. Despite using TensorRT for optimization, the application is only achieving around 10 FPS, which is below the desired real-time performance of at least 30 FPS.
The user provided their code snippet, which includes the YOLOv8 model initialization, object detection, tracking using DeepSort, and visualization of the results. They also shared the output video and the last few lines of tegrastats
, showing the system resource utilization during the application runtime.
Possible Causes
-
Suboptimal TensorRT optimization: The YOLOv8 model may not be fully optimized for the Jetson Orin Nano using TensorRT, leading to slower inference times and lower FPS.
-
Insufficient GPU utilization: The
tegrastats
output indicates low GPU utilization, suggesting that the application is not fully leveraging the GPU’s capabilities. -
Bottlenecks in data read/write operations: The performance bottleneck may be caused by inefficient data read/write operations rather than the inference itself.
-
Computationally expensive post-processing: The object tracking and visualization steps in the code, such as
tracker.update_tracks()
andcv2.imshow()
, may be taking a significant amount of time and limiting the overall FPS.
Troubleshooting Steps, Solutions & Fixes
-
Verify TensorRT optimization:
- Ensure that the YOLOv8 model is exported correctly for TensorRT using the appropriate settings and compatible versions.
- Benchmark the TensorRT inference-only performance using the
trtexec
tool to isolate the inference time from other parts of the application.
-
Boost device performance:
- Set the device to the maximum performance mode using
sudo nvpmodel -m 0
. - Enable high-performance clocks using
sudo jetson_clocks
.
- Set the device to the maximum performance mode using
-
Investigate data read/write bottlenecks:
- Profile the application to identify any bottlenecks in data read/write operations.
- Optimize data loading and preprocessing steps to minimize overhead.
-
Optimize post-processing and visualization:
- Consider using more efficient object tracking algorithms or libraries that are optimized for real-time performance.
- Reduce the frequency of visualization updates or perform them asynchronously to avoid blocking the main inference loop.
-
Leverage NVIDIA DeepStream SDK:
- Explore using the NVIDIA DeepStream SDK, which provides optimized pipelines for video analytics and object tracking.
- The DeepStream SDK offers Python bindings and sample applications that can help accelerate post-processing and improve overall performance[1].
-
Investigate UTF-8 decoding issue:
- The user encountered a UTF-8 decoding error when running the updated code with the newly generated TensorRT engine file.
- This issue may be related to a known problem in the Ultralytics library, as reported in the GitHub issue: https://github.com/ultralytics/ultralytics/issues/1225.
- Follow the suggestions provided in the GitHub issue to resolve the UTF-8 decoding error.
By following these troubleshooting steps and optimizations, the user should be able to improve the FPS performance of their YOLOv8 application on the Jetson Orin Nano and achieve closer to real-time performance of 30 FPS or higher. It’s important to profile and optimize each component of the application, from model inference to post-processing and visualization, to identify and address performance bottlenecks effectively.