PCIe Link Failure on Jetson Orin Nano Custom Carrier Board
Issue Overview
Users are experiencing a PCIe link failure when using a custom carrier board (CCB) with the Jetson Orin Nano module. The specific error message observed is "Tegra194-pcie 14160000.pcie: Phy link never came up". This issue occurs during the boot process and affects all PCIe controllers (C1, C4, and C8) on the custom carrier board. As a result, no PCIe devices are detected, including the on-module Ethernet NIC, rendering network connectivity non-functional. The problem persists regardless of the board configuration used (p3509+p3767 or p3768+p3767).
Possible Causes
-
Hardware Design Discrepancies: Despite efforts to replicate the reference carrier board (P3768-0000) design, there may be subtle differences in the custom carrier board’s PCIe-related circuitry that are causing the link failure.
-
Signal Integrity Issues: The PCIe signals may be experiencing integrity problems due to factors such as improper impedance matching, signal reflections, or electromagnetic interference.
-
Power Supply Problems: Inadequate or unstable power supply to the PCIe subsystem could prevent the PHY from initializing correctly.
-
Clock Signal Issues: Improper clock generation or distribution for the PCIe subsystem may lead to link establishment failures.
-
Firmware or Software Configuration: Although less likely given that the same software image works on the reference carrier board, there could be a mismatch between the firmware/software configuration and the custom hardware design.
-
EEPROM Absence: The custom carrier board lacks an EEPROM, which might affect the system’s ability to properly configure the PCIe subsystem.
Troubleshooting Steps, Solutions & Fixes
-
Verify Hardware Design:
- Carefully review the schematic and PCB layout of the custom carrier board, focusing on PCIe-related components and traces.
- Compare the design with the reference carrier board (P3768-0000) schematics, paying special attention to power rails, terminations, and signal routing.
-
Signal Integrity Analysis:
- Use an oscilloscope to measure PCIe signals (REFCLK, TX, RX) on both the custom carrier board and the reference board.
- Compare signal quality, looking for issues such as excessive ringing, reflections, or noise.
-
Power Supply Verification:
- Measure the voltage levels of all power rails related to the PCIe subsystem.
- Ensure that power sequencing matches the Jetson Orin Nano specifications.
-
Clock Signal Verification:
- Check the PCIe reference clock (REFCLK) signal using an oscilloscope.
- Verify that the clock frequency and signal quality meet the PCIe specifications.
-
PERST Signal Check:
- Measure the PERST (PCIe Reset) signal to ensure it’s behaving correctly during boot.
- Verify that PERST is held low for the required duration before being released.
-
Debug Using NVIDIA’s PCIe Debug Tips:
Follow the debug procedure outlined in the NVIDIA documentation:a. Enable PCIe controller debug logs:
sudo nvpmodel -m 0 sudo su echo 1 > /sys/kernel/debug/tegra_pcie_dbg/set_val
b. Reboot the system and capture dmesg logs:
dmesg > dmesg.txt
c. Analyze the logs for specific PCIe-related messages and errors.
-
Test with Minimal Configuration:
- Remove all non-essential components from the custom carrier board.
- Test the PCIe link with only the bare minimum required circuitry.
-
EEPROM Considerations:
- Investigate the impact of the missing EEPROM on PCIe initialization.
- Consider adding an EEPROM to the custom carrier board if it’s determined to be necessary for proper PCIe configuration.
-
Firmware and Pinmux Verification:
- Double-check that the correct board configuration files are being used.
- Verify that the pinmux settings for PCIe-related pins match those of the reference carrier board.
-
Consult NVIDIA Support:
- If all above steps fail to resolve the issue, consider reaching out to NVIDIA’s support channels for advanced debugging assistance.
- Provide detailed information about the custom carrier board design and all troubleshooting steps performed.
By systematically working through these troubleshooting steps, you should be able to identify and resolve the root cause of the PCIe link failure on your custom carrier board. Remember to document all findings and changes made during the debugging process.