Orin Nano PCIE Clock Turning Off Unexpectedly
Issue Overview
Users of the Nvidia Jetson Orin Nano 8G on a custom carrier board are experiencing an issue where the PCIE clock for an FPGA connected to PCIE0 (which corresponds to M.2m x4 NVMe on the dev kit) turns off unexpectedly. The clock initializes during boot but shuts down after a few seconds, preventing the FPGA from being detected by the system. This behavior is interfering with debugging efforts, particularly for conducting eye diagram analysis of the clock signal.
Possible Causes
-
Default power-saving behavior: The system may be designed to power down the PCIe controller and clock if no PCIe endpoint is detected.
-
FPGA readiness: The FPGA may not be in a ready state when the PCIe reset is de-asserted, leading to failed enumeration.
-
Custom carrier board design: There could be issues with the implementation of the PCIe interface on the custom carrier board.
-
Driver configuration: The PCIe driver may not be correctly configured to maintain the clock signal for debugging purposes.
-
Timing issues: There might be a mismatch in timing between the Orin Nano and the FPGA during the PCIe initialization process.
Troubleshooting Steps, Solutions & Fixes
-
Verify FPGA readiness:
- Ensure that the FPGA is in a ready state when the PCIe reset is de-asserted.
- Check the FPGA configuration and startup sequence to confirm it’s prepared for PCIe enumeration when the Orin Nano attempts to establish the link.
-
Modify PCIe driver behavior:
- Attempt to override the default power-saving behavior by modifying the
tegra_pcie_port_check_link
function in thepci-tegra.c
file. - Recompile the
pcie-tegra194.ko
module with the following changes:
static bool tegra_pcie_port_check_link(struct tegra_pcie_port *port) { struct device *dev = port->pcie->dev; unsigned int retries = 3; unsigned long value; dev_dbg(dev, "PCIe_TEST: Clock to stay on even if no detected endpoints"); /* override presence detection */ value = readl(port->base + RP_PRIV_MISC); value &= ~RP_PRIV_MISC_PRSNT_MAP_EP_ABSNT; value |= RP_PRIV_MISC_PRSNT_MAP_EP_PRSNT; writel(value, port->base + RP_PRIV_MISC); // ... (rest of the function) return true; // Always return true to keep the clock on }
- Attempt to override the default power-saving behavior by modifying the
-
Check PCIe link status:
- Use the
lspci
command to verify if the FPGA is detected after applying the driver modifications. - If the FPGA is still not detected, use a PCIe analyzer tool to investigate the link training process and identify any failures.
- Use the
-
Examine carrier board design:
- Review the custom carrier board schematic and layout to ensure proper implementation of the PCIe interface.
- Verify power supply stability and signal integrity for the PCIe lanes.
-
Adjust PCIe link training parameters:
- Experiment with different PCIe link training parameters in the BIOS/UEFI settings or through kernel boot parameters to see if it affects the clock behavior.
-
Implement a hardware workaround:
- If software solutions fail, consider adding a hardware mechanism to keep the PCIe clock active, such as a simple resistor pull-up on the clock line (consult with a hardware engineer before attempting this).
-
Debug FPGA configuration:
- Use FPGA debugging tools to monitor its state during the PCIe initialization process.
- Ensure that the FPGA’s PCIe core is properly configured and responsive to link training attempts.
-
Consult Nvidia support:
- If the issue persists after trying these steps, reach out to Nvidia developer support for further assistance, as there may be undocumented features or known issues specific to the Orin Nano platform.
-
Consider alternative debugging methods:
- If keeping the PCIe clock on proves challenging, explore other debugging techniques that don’t rely on a constantly active clock signal.
- Use a logic analyzer to capture the initial clock activity and link training process.
Remember to document all changes and test results thoroughly during the troubleshooting process. This will help in identifying patterns and potentially uncovering the root cause of the issue.