INT8 Calibration Reduces Accuracy of PyTorch MNIST Model on Jetson Orin Nano

Issue Overview

Users have reported a significant drop in accuracy when using INT8 calibration for the PyTorch MNIST model on the Nvidia Jetson Orin Nano Dev board. Specifically, the accuracy plummets to less than 10%, with one user noting that out of 10,000 inferences, only 916 were correct, resulting in an accuracy of 9.16%. In contrast, using FP32 or FP16 data types yields over 97% accuracy.

The issue arises during the inference phase after modifying the sample code located at /usr/src/tensorrt/samples/python/network_api_pytorch_mnist/sample.py to support INT8 calibration, referencing another sample from /usr/src/tensorrt/samples/python/int8_caffe_mnist/. Users have confirmed that they generated a new INT8 calibration cache but still faced the accuracy drop. The Jetpack version being used is 4.6.1, and the hardware is identified as the Jetson Nano.

The impact of this problem is substantial for users relying on accurate model predictions for applications, particularly in academic or research settings.

Possible Causes

Calibration Cache Issues: If the calibration cache is not generated correctly or does not align with the model architecture, it may lead to poor performance.
Model Architecture Differences: The PyTorch and Caffe models may have different architectures that affect how INT8 calibration is applied.
Configuration Errors: Incorrect modifications in the sample code could lead to improper handling of weights and layers during inference.
Driver or Software Bugs: There may be bugs in the software stack or driver that affect INT8 processing.
Environmental Factors: Power supply issues or temperature variations could impact performance during inference.
User Errors: Misconfigurations or incorrect data handling when generating calibration caches may lead to reduced accuracy.

Troubleshooting Steps, Solutions & Fixes

Verify Calibration Cache Generation
- Ensure that a new INT8 calibration cache is generated specifically for the PyTorch MNIST model.
- Use the following lines in your modified sample.py to generate and save the cache:
```
calibration_cache = "mnist_calibration.cache"
calib = MNISTEntropyCalibrator(train_set, cache_file=calibration_cache)
```
Check Model Architecture Compatibility
- Confirm that you are using a calibration cache generated specifically for the PyTorch MNIST model rather than one from a different architecture (e.g., Caffe).
Review Code Modifications
- Ensure that modifications made to sample.py are correct and consistent with how weights are assigned across all layers. Pay special attention to the populate_network() function.

Run Calibration and Validation Steps

Follow these commands to set up your environment correctly:

cd /usr/src/tensorrt/data/mnist
sudo wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
sudo wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
sudo gzip -dk t10k-images-idx3-ubyte.gz
sudo gzip -dk train-images-idx3-ubyte.gz

Install necessary dependencies:

sudo apt install python3-pip libboost-all-dev
export CPATH=$CPATH:/usr/local/cuda-11.4/targets/aarch64-linux/include
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda-11.4/targets/aarch64-linux/lib
pip3 install pycuda --user numpy requests pillow

Test with Different Batch Sizes
- Experiment with different batch sizes during both calibration and inference to see if performance improves.

Use Recommended Sample Code

Consider copying the calibrator.py file from int8_caffe_mnist into your working directory and apply suggested patches to sample.py as follows:

diff --git a/samples/python/network_api_pytorch_mnist/sample.py b/samples/python/network_api_pytorch_mnist/sample.py
index e5e95de2..3a5d47f8 100644
--- a/samples/python/network_api_pytorch_mnist/sample.py
+++ b/samples/python/network_api_pytorch_mnist/sample.py
@@ -24,9 +24,12 @@ import numpy as np
 import pycuda.autoinit
 import tensorrt as trt
 
+from calibrator import load_mnist_data, load_mnist_labels, MNISTEntropyCalibrator

 sys.path.insert(1, os.path.join(sys.path[0], ".."))
 import common
 
 # You can set the logger severity higher to suppress messages (or lower to display more messages).
 TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
 
 def build_int8_engine(weights, calib, batch_size=32):
     # ...
     config.set_flag(trt.BuilderFlag.INT8)
     config.int8_calibrator = calib
     # ...

Conduct Inference Tests
- After implementing changes and verifying configurations, run inference tests again using:
```
python3 sample.py
```
Monitor Logs for Errors
- Pay attention to any warnings or errors logged during execution that may indicate underlying issues with configuration or data handling.
Seek Community Support
- If problems persist, consider sharing your modified files and results with community forums for further assistance.

By following these steps and recommendations, users should be able to diagnose and potentially resolve issues related to INT8 calibration affecting accuracy on the Nvidia Jetson Orin Nano Dev board.

Issue Overview

Possible Causes

Troubleshooting Steps, Solutions & Fixes

Cannot Flash Orin Nano, Failed to Expose SSH

Module Powering Voltage on Different Module ID for Nvidia Jetson Orin Nano Dev Board

Failed to flash Jetson ORIN NANO 8GM with SDK Manager

Jetson Orin Nano: External Storage Boot Options and SD Card Interface

How to Run Voice Chat Agent with NanoLLM on Nvidia Jetson Orin Nano

Upgrading JetPack on Nvidia Jetson Orin Nano Dev Board: Limitations and Support

Leave a Reply Cancel reply

More toubleshooting Docs

Info

Development Resources & Official Guides

Follow us on:

Issue Overview

Possible Causes

Troubleshooting Steps, Solutions & Fixes

Similar Posts

Leave a Reply Cancel reply

More toubleshooting Docs

Info

Development Resources & Official Guides

Follow us on: