TensorRT produces all zero outputs when running in reduced precision

Issue Overview

Users are experiencing issues when attempting to run mobilenet_v2 with reduced precision on an emulated Jetson Nano 4GB. The model works correctly in full precision, but when converted to TensorRT using flags for lower precision (–best, –int8, or –fp16), the model stops functioning and produces all zero outputs. This issue occurs on a Jetson Orin 4GB developer board, emulated as a 4GB Nano.

Possible Causes

  1. Insufficient memory: The system is running out of memory during the conversion process, particularly when attempting to use reduced precision modes.
  2. Incorrect calibration: The INT8 calibration process may not be implemented correctly, leading to issues with reduced precision.
  3. Emulation limitations: The problem might be related to the emulation of the Jetson Nano on the Orin developer board.
  4. TensorRT version compatibility: The current TensorRT version (8.4.1.5-1) may have limitations or bugs when dealing with reduced precision on the emulated platform.

Troubleshooting Steps, Solutions & Fixes

  1. Implement INT8 calibration:
    • Create a calibration dataset using a set of representative input images.
    • Implement an EntropyCalibrator class that inherits from trt.tensorrt.IInt8EntropyCalibrator2.
    • Use the calibrator in the TensorRT engine building process.
class EntropyCalibrator(trt.tensorrt.IInt8EntropyCalibrator2):
    def __init__(self, input_layers, output_layers, stream):
        trt.tensorrt.IInt8EntropyCalibrator2.__init__(self)       
        self.input_layers = input_layers
        self.output_layers = output_layers
        self.stream = stream
        self.d_input = cuda.mem_alloc(self.stream.calibration_data.nbytes)
        stream.reset()

    # Implement other required methods: get_batch_size, get_batch, read_calibration_cache, write_calibration_cache
  1. Adjust memory settings:

    • Increase the workspace size in the TensorRT configuration:
    workspace_size = 3 # GB
    config.max_workspace_size = workspace_size * 1 << 30
    config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, workspace_size << 30)
    
  2. Use the latest TensorRT version:

    • Consider upgrading to TensorRT 8.5.0, which is available in the 22.08 Jetson CUDA-X AI Developer Preview.
  3. Test on non-emulated hardware:

    • If possible, try running the model on an actual Jetson Orin devkit without emulation to isolate potential emulation-related issues.
  4. Use trtexec for troubleshooting:

    • Convert the model using trtexec command-line tool to identify if the issue is specific to the Python API:
    trtexec --onnx=mobilenet_v2.onnx --saveEngine=mobilenet_v2.trt --best
    
  5. Check for fallback behavior:

    • The model should trigger a fallback path and use 95% of total memory. Verify if this behavior is occurring.
  6. Analyze warning messages:

    • Pay attention to warnings about insufficient memory and skipped tactics. These may provide clues about resource constraints.
  7. Optimize the ONNX model:

    • Use tools like Polygraphy to optimize and debug the ONNX model before conversion to TensorRT.
  8. Adjust batch size and input dimensions:

    • Experiment with smaller batch sizes or input dimensions to reduce memory requirements during conversion.
  9. Monitor system resources:

    • Use tools like nvidia-smi to monitor GPU memory usage during the conversion process and identify potential bottlenecks.

If the issue persists after trying these steps, consider sharing the mobilenet_v2.onnx file with NVIDIA support for further investigation. Additionally, keep an eye out for the release of TensorRT 8.5, which may include fixes or improvements related to this issue.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *