Bootloader dropping to shell or network boot after flashing a backed up image
Issue Overview
Users are experiencing a problem where the Nvidia Jetson Orin Nano Dev board, after cloning an image using backup scripts, either boots to a shell or attempts to boot from the network instead of the NVMe drive. The symptoms include:
- A delay of approximately 5 minutes while trying to boot from the NVMe drive.
- The issue occurs consistently after using the command:
sudo ./tools/backup_restore/l4t_backup_restore.sh -e nvme0n1 -b custom-board-conf
for backing up and:
sudo ./tools/backup_restore/l4t_backup_restore.sh -e nvme0n1 -r custom-board-conf
for restoring.
- Users have reported that modifications to the bootloader settings do not persist after cloning, leading to repeated issues with the boot order.
- The problem significantly impacts production workflows, as manual intervention is required to adjust boot settings each time.
Possible Causes
- Hardware Incompatibilities: If the cloned image is restored on a different board, hardware differences may lead to incorrect boot configurations.
- Software Bugs: There may be bugs in the backup and restore scripts that do not properly handle UEFI configurations.
- Configuration Errors: Incorrect settings in UEFI related to device hierarchy could lead to network devices being prioritized over NVMe.
- Driver Issues: Conflicts between drivers for different hardware components might cause boot failures.
- Environmental Factors: Power supply issues or overheating could affect boot processes.
- User Misconfigurations: Users may not be following the correct procedures for modifying UEFI settings.
Troubleshooting Steps, Solutions & Fixes
Step-by-Step Diagnosis
- Check Ethernet Connections: Ensure that no Ethernet cables are connected during boot if they are not needed, as they can trigger network boot attempts [Reply_3].
- Review UEFI Settings: Follow instructions to modify UEFI settings:
- Remove network devices from the boot stack in UEFI source code if applicable [Reply_4].
- Adjust
NewDeviceHierarchy
setting from01
to00
to place newly detected devices at the bottom of the boot order [Reply_5].
- Backup and Restore Process:
- Confirm whether both NVMe and QSPI are being backed up and restored correctly [Reply_10].
- If only NVMe is restored, ensure that QSPI is flashed with appropriate configurations.
Recommended Solutions
-
Post-Process QSPI Image:
- Erase specific partitions in your QSPI image that store UEFI variables:
dd if=/dev/zero bs=512 count=512 of=QSPI0.img seek=128000 conv=notrunc dd if=/dev/zero bs=512 count=1024 of=QSPI0.img seek=128512 conv=notrunc dd if=/dev/zero bs=512 count=128 of=QSPI0.img seek=130432 conv=notrunc
- This will reset UEFI variables and allow for a fresh boot configuration [Reply_11].
- Erase specific partitions in your QSPI image that store UEFI variables:
-
Update nvpartitionmap.txt: After modifying the QSPI image, update the checksum in
nvpartitionmap.txt
to reflect changes made during partition erasure. -
Testing Different Configurations:
- Test with different hardware setups or revert to factory settings to identify if specific configurations cause issues.
- Use a fresh installation of JetPack (preferably JetPack 5) as it may resolve underlying software conflicts [Reply_7].
-
Documentation Links:
- Refer to Nvidia’s official documentation for detailed instructions on modifying UEFI settings and troubleshooting steps:
Best Practices for Prevention
- Regularly back up your configurations before making changes.
- Use stable versions of JetPack for production environments rather than developer previews.
- Document any changes made during troubleshooting for future reference.
Unresolved Aspects
- Further investigation may be needed into whether specific variables in the QSPI image are causing persistent network boot attempts after cloning.
- Users have reported mixed results with various solutions; continued community feedback may help refine these approaches.
This document aims to assist users facing similar issues with their Nvidia Jetson Orin Nano Dev board by providing structured troubleshooting steps and potential solutions based on community discussions.