Slow Performance of Local Small Models on Jetson Orin Nano

Issue Overview

Users of the Jetson Orin Nano Developer Kit with JetPack 6 are experiencing extremely slow performance when running local small language models, particularly in text generation tasks. The issue manifests as follows:

The chatbot responds extremely slowly when using models like llama-2-7b-chat.Q4_0.gguf
Setting n-gpu-layers to 128 as suggested in tutorials causes the Jetson to freeze, requiring a restart
With n-gpu-layers set to 0, the chatbot works but remains very slow, and the overall system becomes sluggish
The model appears to be running on the CPU instead of the GPU, given the poor performance
Terminal output shows token generation speed of only 0.02 tokens/s

This problem significantly impacts the user experience and the practical utility of the Jetson Orin Nano for running local language models.

Possible Causes

Insufficient GPU Memory: The Jetson Orin Nano may not have enough GPU memory to handle the specified number of GPU layers for the model.
Improper GPU Utilization: The model might not be effectively utilizing the GPU, causing it to fall back to CPU processing.
Suboptimal Model Configuration: The chosen model or its configuration may not be optimized for the Jetson Orin Nano’s hardware capabilities.
Memory Management Issues: Lack of proper memory management, including insufficient swap space, could be limiting the system’s performance.
Software Optimization: The llama.cpp implementation used might not be fully optimized for the Jetson platform.
Model Size Mismatch: The selected language model might be too large for the Jetson Orin Nano’s specifications.

Troubleshooting Steps, Solutions & Fixes

Optimize GPU Memory Usage:
- Reduce the number of GPU layers to find a balance between performance and stability.
- Experiment with different n-gpu-layers values, starting from a lower number and gradually increasing.
Increase Swap Space:
- Follow the steps outlined in the Jetson containers setup guide to mount additional swap space:
```
sudo systemctl disable nvzramconfig
sudo fallocate -l 16G /mnt/16GB.swap
sudo mkswap /mnt/16GB.swap
sudo swapon /mnt/16GB.swap
```
- Add the following line to /etc/fstab to make the swap persistent:
```
/mnt/16GB.swap  none  swap  sw 0  0
```
Try Alternative Implementations:
- Use Ollama instead of llama.cpp, as it’s generally easier to use and performs better out-of-the-box on Jetson devices.
- Refer to the Jetson AI Lab page for Ollama setup and usage instructions.
Use Smaller Language Models:
- Explore smaller language models that are better suited for the Jetson Orin Nano’s capabilities.
- Refer to the Small LLM (SLM) page on the Jetson AI Lab website for a list of compatible models.
Optimize System Resources:
- Close unnecessary applications and processes to free up system resources.
- Monitor system resource usage using tools like top or htop to identify potential bottlenecks.
Update Software and Drivers:
- Ensure that JetPack 6 and all associated drivers are up to date.
- Check for any available updates or patches specific to language model performance on Jetson devices.
Verify GPU Utilization:
- Use the tegrastats command to monitor GPU usage during model inference:
```
tegrastats
```
- Confirm that the GPU is being utilized and identify any potential issues.
Experiment with Different Model Quantizations:
- Try different quantization levels for the model (e.g., Q4_0, Q5_1) to find a balance between performance and accuracy.
Consult Jetson Community Resources:
- Check the Jetson Developer Forums for similar issues and potential solutions.
- Reach out to the NVIDIA Jetson community for specific advice on optimizing language model performance on the Orin Nano.

By following these steps and exploring the suggested solutions, users should be able to improve the performance of local small language models on their Jetson Orin Nano Developer Kit. If issues persist, further investigation into hardware-specific optimizations or alternative model architectures may be necessary.

Issue Overview

Possible Causes

Troubleshooting Steps, Solutions & Fixes

Flashing Error on Nvidia Jetson Orin Nano Custom Board

ADSP OS Loading Timeout on Nvidia Jetson Orin Nano Dev Board

SDKM Flashing NVMe Jetson Orin Nano DevKit

The fresh new Orin Nano Dev Kit failed to boot

MIPI CSI Virtual Channel Configuration Issue for Multiple Cameras on Nvidia Jetson Orin Nano

Jetson Orin Nano: Issues with I2C Bus 7 Not Working

Leave a Reply Cancel reply

More toubleshooting Docs

Info

Development Resources & Official Guides

Follow us on:

Issue Overview

Possible Causes

Troubleshooting Steps, Solutions & Fixes

Similar Posts

Leave a Reply Cancel reply

More toubleshooting Docs

Info

Development Resources & Official Guides

Follow us on: