Execution of convert_to_onnx.py from onnx_packnet directory resulted in displaying Killed

Issue Overview

Users have reported encountering an issue when executing the script convert_to_onnx.py located in the directory /usr/src/tensorrt/samples/python/onnx_packnet/. When running the command:

python3.6 convert_to_onnx.py --output model.onnx

the process ends with the message "Killed."

Specific Symptoms

  • The script fails to complete, displaying a "Killed" message.
  • Warning messages related to deprecated functions in PyTorch are shown in the log output.

Context of the Problem

  • The issue occurs during the execution of a Python script intended to convert a model to ONNX format.
  • The error appears to be linked to memory management, as indicated by the "Killed" message.

Relevant Specifications

  • Hardware: Nvidia Jetson Orin Nano Dev board (implied).
  • Software: Python 3.6, PyTorch (specific version not mentioned), TensorRT.

Frequency and Impact

  • This issue appears to occur consistently when memory limits are reached.
  • The impact on user experience is significant, as it prevents successful execution of model conversion tasks, which are critical for deploying models.

Possible Causes

  • Memory Limitations: The most likely cause is that the device runs out of memory during execution, leading to the process being killed by the operating system.

  • Software Bugs: Deprecation warnings in PyTorch suggest potential issues with compatibility or bugs in the code being executed.

  • Configuration Errors: Incorrect configurations or insufficient resources allocated for running the script may contribute to the problem.

  • Driver Issues: Outdated or incompatible drivers could lead to performance issues and memory management failures.

Troubleshooting Steps, Solutions & Fixes

  1. Check System Memory Usage:

    • Use tegrastats to monitor memory usage while executing the script.
    • Command:
      sudo tegrastats
      
  2. Add Swap Memory:

    • If memory usage is at maximum, consider adding swap memory to alleviate memory pressure.
    • Steps to add swap:
      sudo fallocate -l 2G /swapfile  # Create a swap file of 2GB
      sudo chmod 600 /swapfile        # Set permissions
      sudo mkswap /swapfile           # Set up swap space
      sudo swapon /swapfile           # Enable swap
      
    • Verify that swap is active:
      sudo swapon --show
      
  3. Update Software Packages:

    • Ensure that all relevant software packages (including PyTorch and TensorRT) are up-to-date.
    • Use pip for updating:
      pip install --upgrade torch torchvision torchaudio
      
  4. Modify Code for Compatibility:

    • Address deprecation warnings by modifying the code as suggested in the warnings. For instance, replace floordiv with torch.div(a, b, rounding_mode='trunc').
  5. Test with Different Configurations:

    • Try running the script with different model sizes or simpler models to see if it executes successfully under lower resource demands.
  6. Consult Documentation:

    • Review official Nvidia documentation for any specific guidelines related to memory management and model conversion on Jetson devices.
  7. Monitor Temperature and Power Supply:

    • Ensure that environmental factors such as temperature and power supply are within acceptable ranges, as overheating or inadequate power can also lead to performance issues.
  8. Best Practices for Future Prevention:

    • Regularly monitor system resources while running intensive tasks.
    • Keep software updated and follow best practices for managing resources on embedded systems like Jetson boards.

Unresolved Aspects

Further investigation may be needed into specific versions of PyTorch and TensorRT compatibility with the Jetson Orin Nano Dev board, as well as any additional environmental factors that could influence performance.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *