Jetson Orin Nano – NanoVLM Model Execution Issues
Issue Overview
Users are experiencing issues while attempting to run the NanoVLM model on the Nvidia Jetson Orin Nano 8GB using JetPack version 6.0. The problems manifest during the model’s setup and execution phases, particularly when executing commands for downloading and inferring the model.
Symptoms:
- The command to run the model is killed automatically, resulting in a
subprocess.CalledProcessError
. - Users report that the device sometimes restarts or shuts down unexpectedly during model execution.
- Warnings about deprecated cache usage appear, along with memory-related errors.
Context:
- The issue arises after cloning the dusty-nv/jetson-containers repository and executing installation commands.
- The specific command leading to errors is:
jetson-containers run $(autotag nano_llm) python3 -m nano_llm.chat --model Efficient-Large-Model/VILA-2.7b --max-context-len 256 --max-new-tokens 32
Hardware/Software Specifications:
- Device: Nvidia Jetson Orin Nano 8GB
- JetPack Version: 6.0
- CUDA Version: 12.2
- Container Image: dusty-nv/nano_llm:24.5-r36.2.0
Frequency:
The issue appears to be consistent among multiple users attempting similar tasks.
Impact:
The inability to run the model effectively hampers user experience, preventing successful inference and limiting the functionality of the Jetson Orin Nano for machine learning tasks.
Possible Causes
-
Hardware Limitations: The Orin Nano may not have sufficient memory to handle the model, especially during quantization phases.
-
Software Bugs or Conflicts: Issues within the Docker container or incompatibilities between software versions could lead to execution failures.
-
Configuration Errors: Incorrect settings or command parameters could cause the model to fail during execution.
-
Driver Issues: Outdated or incompatible drivers may lead to unexpected behavior during GPU operations.
-
Environmental Factors: Insufficient power supply or overheating could result in system instability, causing restarts or shutdowns.
-
User Misconfigurations: Improper setup of SWAP space or ZRAM settings might lead to memory shortages during execution.
Troubleshooting Steps, Solutions & Fixes
-
Check System Resources:
- Monitor GPU and CPU usage during execution using:
nvidia-smi top
- Ensure that there is adequate free memory available.
- Monitor GPU and CPU usage during execution using:
-
Increase SWAP Space:
- Follow instructions to mount SWAP and disable ZRAM as this can help alleviate memory constraints:
sudo fallocate -l 4G /swapfile sudo chmod 600 /swapfile sudo mkswap /swapfile sudo swapon /swapfile
- Follow instructions to mount SWAP and disable ZRAM as this can help alleviate memory constraints:
-
Disable Desktop GUI:
- If running a desktop environment, consider disabling it during model execution to free up resources.
-
Update Container Image:
- Ensure you are using the latest version of the container by executing:
cd /path/to/your/jetson-containers git pull docker pull $(autotag nano_llm)
- Ensure you are using the latest version of the container by executing:
-
Run with Vision API Option:
- To bypass certain errors related to TensorRT, try running with the
--vision-api=hf
flag:jetson-containers run $(autotag nano_llm) python3 -m nano_llm.chat --vision-api=hf --model Efficient-Large-Model/VILA-2.7b --max-context-len 256 --max-new-tokens 32
- To bypass certain errors related to TensorRT, try running with the
-
Clear Memory Cache:
- Use the following command to clear memory buffer cache before running your commands again:
sudo sh -c 'echo 1 > /proc/sys/vm/drop_caches'
- Use the following command to clear memory buffer cache before running your commands again:
-
Check for Module Availability:
- If encountering module not found errors (e.g.,
No module named nano_llm.vision.video
), ensure that you have pulled the latest image as new functionalities may have been added.
- If encountering module not found errors (e.g.,
-
Rebuild Container if Necessary:
- If changes are made in local directories that are not reflecting inside the container, rebuild it using appropriate Docker commands.
-
Monitor Logs for Errors:
- Review logs for specific error messages that can provide insights into what might be going wrong during execution.
-
Documentation and Community Support:
- Refer to official documentation for any updates regarding driver installations and configurations.
- Engage with community forums for shared experiences and solutions from other users facing similar issues.
By following these steps, users should be able to mitigate issues related to running the NanoVLM model on their Jetson Orin Nano devices effectively.