Failed to boot from certain NVMe with R35.5.0
Issue Overview
Users are experiencing boot failures when attempting to use specific NVMe drives (Innodisk 4TG2-P) with the Nvidia Jetson Orin Nano Dev board running the R35.5.0 (Jetpack 5.1.3) BSP. This issue does not occur with the previous version, R35.4.1 (Jetpack 5.1.2).
Symptoms:
- The board fails to boot into the Ubuntu OS despite recognizing the NVMe in the UEFI boot menu.
- The issue is consistent across multiple tests with a total of five NVMe drives, where two of the Innodisk drives failed to boot.
Context:
- The problem arises during the initial setup after flashing the board with R35.5.0.
- Boot logs indicate differences in behavior between R35.4.1 and R35.5.0.
Hardware/Software Specifications:
- Jetson Orin Nano Dev board.
- R35.5.0 (Jetpack 5.1.3) and R35.4.1 (Jetpack 5.1.2) BSP.
- NVMe drives: Innodisk 4TG2-P and WD SN550.
Impact:
- Users are unable to utilize certain NVMe drives with the latest software version, impacting development and deployment timelines.
Possible Causes
-
Bootloader Differences: The bootloader binary in QSPI may differ between R35.4.1 and R35.5.0, which could affect compatibility with specific NVMe drives.
-
Firmware Issues: The firmware of the problematic Innodisk drives may be incompatible with the new bootloader, leading to boot failures.
-
Configuration Errors: Incorrect flashing procedures or parameters may result in improper setup of the NVMe drives.
-
Driver Issues: Changes in driver support or functionality between versions could lead to incompatibilities.
-
Environmental Factors: Power supply inconsistencies or thermal issues may exacerbate hardware compatibility problems.
Troubleshooting Steps, Solutions & Fixes
-
Verify Firmware Versions:
- Check and compare firmware versions of both working and non-working Innodisk NVMe drives.
- Update firmware if discrepancies are found.
-
Examine Bootloader Differences:
- Investigate which specific component of the bootloader (MB1, MB2, UEFI) is causing the issue.
- Consider reverting to R35.4.1 if immediate functionality is required.
-
Reflash Procedures:
- Follow these commands for flashing:
sudo ./tools/kernel_flash/l4t_initrd_flash_pw.sh --external-device nvme0n1p1 \ -c tools/kernel_flash/flash_l4t_external.xml -p "-c bootloader/t186ref/cfg/flash_t234_qspi.xml" \ --showlogs --network usb0 pjai-100onox internal
- Ensure that QSPI flash is updated properly when switching between versions.
- Follow these commands for flashing:
-
Clean Unused Partitions:
- Modify flash scripts to clean unused partitions on the NVMe drive:
dd if=/dev/zero of=${part_name} seek=${start_sector} bs=${pblksz} count=${partition_blk_size}
- This step may help resolve issues related to residual data affecting boot processes.
- Modify flash scripts to clean unused partitions on the NVMe drive:
-
Conduct Stress Tests:
- Boot using alternative devices (e.g., USB) while testing the problematic NVMe drive for I/O errors.
- This can help isolate whether the issue is related specifically to the NVMe drive or broader system compatibility.
-
Log Analysis:
- Review boot logs for both successful and failed attempts to identify specific errors or warnings that could indicate root causes.
- Logs can be found under:
fail_boot_l4t_35_5_0_ox8g_inno_verbose.log
normal_boot_l4t_35_4_1_ox8g_inno_verbose.log
-
Documentation & Updates:
- Keep an eye on Nvidia’s official documentation for updates regarding driver support or firmware updates that might address this issue.
- Regularly check for community discussions on similar issues for shared solutions.
-
Request Support from Nvidia:
- If issues persist after following these steps, consider reaching out to Nvidia support with detailed logs and descriptions of all troubleshooting steps taken.
By following these structured troubleshooting steps, users can systematically address the booting issues associated with certain NVMe drives on their Nvidia Jetson Orin Nano Dev board running R35.5.0, while also preparing for potential future compatibility challenges as software updates are released.