Unable to run live LLaVA on Jetson Orin Nano Developer Kit

Issue Overview

Users are experiencing difficulties running the live LLaVA (Large Language and Vision Assistant) on the Nvidia Jetson Orin Nano Developer Kit. The main problem occurs when executing the nano_llm.agents.video_query module, which is part of a tutorial for implementing vision-language models on Jetson devices. The issue manifests as a runtime error, specifically a subprocess.CalledProcessError with a SIGKILL signal, indicating that the process is being terminated due to memory constraints.

Possible Causes

  1. Insufficient Memory: The Jetson Orin Nano has 8GB of memory, which may not be enough to run the full LLaVA model with default settings.

  2. Large Model Size: The VILA-3B model uses a larger SigLIP-384×384 vision encoder, which requires more memory to build the TensorRT engine.

  3. TensorRT Optimization: The process of converting the vision encoder to a TensorRT-optimized format is challenging on devices with limited resources like the Jetson Orin Nano.

  4. Default Configuration: The default context length and other parameters may be too high for the available memory on the Orin Nano.

  5. System Resource Allocation: Lack of swap space or presence of memory-intensive processes (like GUI) may contribute to the issue.

Troubleshooting Steps, Solutions & Fixes

  1. Increase Available Memory:

    • Mount swap space:
      sudo fallocate -l 4G /mnt/4GB.swap
      sudo mkswap /mnt/4GB.swap
      sudo swapon /mnt/4GB.swap
      
    • Disable ZRAM if enabled.
    • Disable the desktop UI if not needed:
      sudo systemctl set-default multi-user.target
      sudo reboot
      
  2. Adjust Model Parameters:

    • Reduce the context length:
      --max-context-len 256
      
    • Limit the number of new tokens:
      --max-new-tokens 32
      
  3. Use Alternative Vision API:

    • Instead of TensorRT, use the Hugging Face implementation:
      --vision-api=hf
      
  4. Try Smaller Models:

    • Use VILA-2.7B instead of VILA-3B if possible.
  5. Full Command with Optimizations:

    jetson-containers run $(autotag nano_llm) \
      python3 -m nano_llm.agents.video_query --api=mlc \
        --model Efficient-Large-Model/VILA1.5-3b \
        --max-context-len 256 \
        --max-new-tokens 32 \
        --video-input /dev/video0 \
        --video-output webrtc://@:8554/output \
        --vision-api=hf
    
  6. Update Software:

    • Ensure you have the latest JetPack and container images installed.
    • Pull the latest updates for the jetson-containers repository:
      git pull https://github.com/dusty-nv/jetson-containers
      
  7. Monitor Resource Usage:

    • Use htop or nvidia-smi to monitor CPU, GPU, and memory usage during execution.
  8. Alternative Models:

    • Consider using smaller or more optimized models that are better suited for the Jetson Orin Nano’s resources.
  9. Wait for Official Support:

    • The developers are working on creating pre-built TensorRT engines for the CLIP/SigLIP models, which may be distributed through the Hugging Face Hub in the future.

By implementing these steps, users should be able to run the live LLaVA on the Jetson Orin Nano, albeit with some limitations compared to more powerful Jetson devices. The key is to optimize memory usage and leverage alternative implementations that are less resource-intensive.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *