Jetson Orin Nano Dev Board Lockup Issue with VILA 1.5 3B

Issue Overview

Users have reported experiencing lockup issues when attempting to run the VILA 1.5 3B model on the Nvidia Jetson Orin Nano Dev Kit (8GB). The problem occurs specifically during the execution of a command intended to initiate a chat application using the nano_llm container.

Symptoms

  • The device locks up and reboots when running the command:

    jetson-containers run $(autotag nano_llm) \
      python3 -m nano_llm.chat --api=mlc \
        --model Efficient-Large-Model/VILA1.5-3b \
        --max-context-len 256 \
        --max-new-tokens 32
    
  • Terminal output shows that the model architecture is printed, but execution halts thereafter.

Context

  • The issue arises after adding 20GB of swap memory to the system.
  • System resource monitoring via htop indicates that around 7.15GB of RAM is utilized, with only 2-3GB of swap being used.
  • The problem appears to be consistent, as multiple users have reported similar experiences.

Impact

This issue significantly affects user experience, as it prevents successful execution of AI models on the Jetson Orin Nano, limiting its functionality for intended applications.

Possible Causes

  • Hardware Limitations: The Jetson Orin Nano may not have sufficient resources (RAM/CPU) to handle the model’s requirements, especially under high load scenarios.

  • Software Bugs or Conflicts: There might be bugs in the nano_llm container or conflicts with existing software versions on the device.

  • Configuration Errors: Incorrect configurations in system settings or Docker containers could lead to performance issues.

  • Driver Issues: Outdated or incompatible drivers may cause instability during model execution.

  • Environmental Factors: Insufficient power supply or overheating may contribute to device lockups.

  • User Errors: Incorrect command usage or failure to follow setup instructions might lead to operational problems.

Troubleshooting Steps, Solutions & Fixes

  1. Update Container Image

    • Pull the latest nano_llm container image to ensure you are using a version that may have resolved existing bugs:
      docker pull dustynv/nano_llm:r36.2.0
      
  2. Disable Desktop GUI

    • If lockups persist, consider disabling the desktop GUI as it can consume additional resources:
  3. Check JetPack Version

    • Ensure that you are using a compatible version of JetPack:
      • JetPack 4.6.1+ (>= L4T R32.7.1)
      • JetPack 5.1+ (>= L4T R35.2.1)
      • JetPack 6.0 DP (L4T R36.2.0)
  4. Clone and Install Jetson Utilities

    • Clone the necessary utilities and install them:
      git clone https://github.com/dusty-nv/jetson-containers
      bash jetson-containers/install.sh
      
  5. Disable Z-RAM

    • If Z-RAM is enabled, consider disabling it when mounting additional swap memory, as it may conflict with performance.
  6. Specify Command-Line Options

    • If issues continue, try manually specifying additional options when starting the chat program:
      python3 -m nano_llm.chat --api=mlc --model Efficient-Large-Model/VILA1.5-3b --max-context-len 256 --max-new-tokens 32 --vision-api=hf
      
  7. Monitor System Resources

    • Use htop or similar tools to monitor RAM and CPU usage during execution to identify potential bottlenecks.
  8. Test with Different Configurations

    • Isolate the issue by testing different hardware configurations or software setups to determine if specific combinations trigger lockups.
  9. Documentation and Updates

    • Regularly check for updates in documentation related to Jetson Orin Nano and nano_llm, as improvements and fixes are frequently released.
  10. Community Support

    • Engage with community forums for additional insights or shared experiences from other users who faced similar issues.

By following these steps and solutions, users can potentially resolve the lockup issue experienced while running VILA 1.5 3B on their Nvidia Jetson Orin Nano Dev Kit.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *