Execution of convert_to_onnx.py from onnx_packnet directory resulted in displaying Killed
Issue Overview
Users have reported encountering an issue when executing the script convert_to_onnx.py
located in the directory /usr/src/tensorrt/samples/python/onnx_packnet/
. When running the command:
python3.6 convert_to_onnx.py --output model.onnx
the process ends with the message "Killed."
Specific Symptoms
- The script fails to complete, displaying a "Killed" message.
- Warning messages related to deprecated functions in PyTorch are shown in the log output.
Context of the Problem
- The issue occurs during the execution of a Python script intended to convert a model to ONNX format.
- The error appears to be linked to memory management, as indicated by the "Killed" message.
Relevant Specifications
- Hardware: Nvidia Jetson Orin Nano Dev board (implied).
- Software: Python 3.6, PyTorch (specific version not mentioned), TensorRT.
Frequency and Impact
- This issue appears to occur consistently when memory limits are reached.
- The impact on user experience is significant, as it prevents successful execution of model conversion tasks, which are critical for deploying models.
Possible Causes
-
Memory Limitations: The most likely cause is that the device runs out of memory during execution, leading to the process being killed by the operating system.
-
Software Bugs: Deprecation warnings in PyTorch suggest potential issues with compatibility or bugs in the code being executed.
-
Configuration Errors: Incorrect configurations or insufficient resources allocated for running the script may contribute to the problem.
-
Driver Issues: Outdated or incompatible drivers could lead to performance issues and memory management failures.
Troubleshooting Steps, Solutions & Fixes
-
Check System Memory Usage:
- Use
tegrastats
to monitor memory usage while executing the script. - Command:
sudo tegrastats
- Use
-
Add Swap Memory:
- If memory usage is at maximum, consider adding swap memory to alleviate memory pressure.
- Steps to add swap:
sudo fallocate -l 2G /swapfile # Create a swap file of 2GB sudo chmod 600 /swapfile # Set permissions sudo mkswap /swapfile # Set up swap space sudo swapon /swapfile # Enable swap
- Verify that swap is active:
sudo swapon --show
-
Update Software Packages:
- Ensure that all relevant software packages (including PyTorch and TensorRT) are up-to-date.
- Use pip for updating:
pip install --upgrade torch torchvision torchaudio
-
Modify Code for Compatibility:
- Address deprecation warnings by modifying the code as suggested in the warnings. For instance, replace
floordiv
withtorch.div(a, b, rounding_mode='trunc')
.
- Address deprecation warnings by modifying the code as suggested in the warnings. For instance, replace
-
Test with Different Configurations:
- Try running the script with different model sizes or simpler models to see if it executes successfully under lower resource demands.
-
Consult Documentation:
- Review official Nvidia documentation for any specific guidelines related to memory management and model conversion on Jetson devices.
-
Monitor Temperature and Power Supply:
- Ensure that environmental factors such as temperature and power supply are within acceptable ranges, as overheating or inadequate power can also lead to performance issues.
-
Best Practices for Future Prevention:
- Regularly monitor system resources while running intensive tasks.
- Keep software updated and follow best practices for managing resources on embedded systems like Jetson boards.
Unresolved Aspects
Further investigation may be needed into specific versions of PyTorch and TensorRT compatibility with the Jetson Orin Nano Dev board, as well as any additional environmental factors that could influence performance.