Running AI Benchmarks on Multiple Jetson Orin Nano Devices
Issue Overview
Users are seeking guidance on running AI performance benchmarks across multiple Jetson Orin Nano devices simultaneously. The specific challenge arises from the fact that the official NVIDIA Jetson Benchmarks repository (GitHub – NVIDIA-AI-IOT/jetson_benchmarks) only supports benchmarking on a single device. This limitation prevents users from easily evaluating the performance of a cluster of Jetson Orin Nano devices, which is crucial for scaling AI workloads across multiple units.
Possible Causes
-
Limited scope of official benchmarking tools: The NVIDIA-provided benchmarking suite is designed for single-device use cases, not accounting for multi-device or cluster scenarios.
-
Lack of built-in cluster support: The current benchmarking tools do not include native functionality for distributing workloads across multiple Jetson devices.
-
Absence of official documentation: NVIDIA has not provided specific guidance or documentation for running benchmarks on Jetson clusters, leaving users uncertain about best practices.
Troubleshooting Steps, Solutions & Fixes
-
Distribute data manually:
- NVIDIA recommends running different data on each device in a cluster to increase throughput.
- Users will need to implement their own data distribution mechanism to send distinct datasets to each Jetson Orin Nano in the cluster.
-
Develop a custom benchmarking solution:
- Create a wrapper script or application that:
a. Identifies all Jetson devices in the cluster
b. Distributes benchmark tasks across the devices
c. Collects and aggregates results from each device - This approach requires additional development effort but allows for tailored benchmarking of multi-device setups.
- Create a wrapper script or application that:
-
Utilize container orchestration:
- Implement a container orchestration solution like Kubernetes to manage the deployment of benchmarking tasks across multiple Jetson devices.
- This method provides better scalability and management for larger clusters.
-
Parallel execution of single-device benchmarks:
- Run the existing single-device benchmark simultaneously on each Jetson Orin Nano using separate SSH sessions or a parallel SSH tool.
- Collect the results manually and aggregate them for analysis.
Example using parallel-ssh:
parallel-ssh -h hosts.txt -i "cd jetson_benchmarks && python3 benchmark.py"
-
Leverage NVIDIA DeepStream SDK:
- While not a direct benchmarking solution, DeepStream SDK supports multi-stream, multi-device inference pipelines.
- Create a DeepStream application that utilizes multiple Jetson devices and measure its performance as a proxy for cluster capabilities.
-
Monitor system-wide metrics:
- Use tools like
tegrastats
on each device to collect system-level performance data while running benchmarks. - Aggregate this data to get a comprehensive view of cluster performance.
Example command:
tegrastats --interval 1000 --logfile benchmark_stats.log
- Use tools like
-
Contact NVIDIA Developer Support:
- Reach out to NVIDIA’s developer support channels for guidance on best practices for benchmarking Jetson clusters.
- Request information about any upcoming tools or documentation for multi-device benchmarking scenarios.
-
Community solutions:
- Search for and contribute to community-driven projects on platforms like GitHub that aim to address multi-device Jetson benchmarking.
- Collaborate with other developers to create open-source solutions for cluster benchmarking on Jetson devices.
While there is no official, out-of-the-box solution for benchmarking multiple Jetson Orin Nano devices simultaneously, these approaches can help users assess the performance of their Jetson clusters. The most suitable method will depend on the specific use case, cluster size, and available development resources.