Unable to run live LLaVA on Jetson Orin Nano Developer Kit
Issue Overview
Users are experiencing difficulties running the live LLaVA (Large Language and Vision Assistant) on the Nvidia Jetson Orin Nano Developer Kit. The main problem occurs when executing the nano_llm.agents.video_query
module, which is part of a tutorial for implementing vision-language models on Jetson devices. The issue manifests as a runtime error, specifically a subprocess.CalledProcessError
with a SIGKILL
signal, indicating that the process is being terminated due to memory constraints.
Possible Causes
-
Insufficient Memory: The Jetson Orin Nano has 8GB of memory, which may not be enough to run the full LLaVA model with default settings.
-
Large Model Size: The VILA-3B model uses a larger SigLIP-384×384 vision encoder, which requires more memory to build the TensorRT engine.
-
TensorRT Optimization: The process of converting the vision encoder to a TensorRT-optimized format is challenging on devices with limited resources like the Jetson Orin Nano.
-
Default Configuration: The default context length and other parameters may be too high for the available memory on the Orin Nano.
-
System Resource Allocation: Lack of swap space or presence of memory-intensive processes (like GUI) may contribute to the issue.
Troubleshooting Steps, Solutions & Fixes
-
Increase Available Memory:
- Mount swap space:
sudo fallocate -l 4G /mnt/4GB.swap sudo mkswap /mnt/4GB.swap sudo swapon /mnt/4GB.swap
- Disable ZRAM if enabled.
- Disable the desktop UI if not needed:
sudo systemctl set-default multi-user.target sudo reboot
- Mount swap space:
-
Adjust Model Parameters:
- Reduce the context length:
--max-context-len 256
- Limit the number of new tokens:
--max-new-tokens 32
- Reduce the context length:
-
Use Alternative Vision API:
- Instead of TensorRT, use the Hugging Face implementation:
--vision-api=hf
- Instead of TensorRT, use the Hugging Face implementation:
-
Try Smaller Models:
- Use VILA-2.7B instead of VILA-3B if possible.
-
Full Command with Optimizations:
jetson-containers run $(autotag nano_llm) \ python3 -m nano_llm.agents.video_query --api=mlc \ --model Efficient-Large-Model/VILA1.5-3b \ --max-context-len 256 \ --max-new-tokens 32 \ --video-input /dev/video0 \ --video-output webrtc://@:8554/output \ --vision-api=hf
-
Update Software:
- Ensure you have the latest JetPack and container images installed.
- Pull the latest updates for the jetson-containers repository:
git pull https://github.com/dusty-nv/jetson-containers
-
Monitor Resource Usage:
- Use
htop
ornvidia-smi
to monitor CPU, GPU, and memory usage during execution.
- Use
-
Alternative Models:
- Consider using smaller or more optimized models that are better suited for the Jetson Orin Nano’s resources.
-
Wait for Official Support:
- The developers are working on creating pre-built TensorRT engines for the CLIP/SigLIP models, which may be distributed through the Hugging Face Hub in the future.
By implementing these steps, users should be able to run the live LLaVA on the Jetson Orin Nano, albeit with some limitations compared to more powerful Jetson devices. The key is to optimize memory usage and leverage alternative implementations that are less resource-intensive.