Enzians have an on-board JTAG controller (sometimes also called JTAG adapter or probe). It is brought out on the rear I/O panel of the machines as a USB type B connector. All the cluster machines' JTAG ports are connected to enzian-gateway which runs the Xilinx hardware server so the boards' scan chains can be accessed remotely. The JTAG cable IDs are listed on the cluster info page on the Enzian website. Be careful when accessing JTAG to use the correct IDs. There is currently no mechanism that ensures you access only JTAG ports of machines that you have reserved.
Scan Chain
There are several devices on the Enzian boards that have JTAG (cf. schematics p117). Some of them can be put in the USB scan chain:
- CPU
- FPGA
- BMC
- Phys for CPU ethernet
They all also have their dedicated JTAG headers. Additionally there are population options for JTAG headers for both the CPUs and FPGAs PCI connector and the FMC connector. Each device should only be accessed either through the USB scan chain or the dedicated header as they are electrically connected and JTAG does not have any arbitration.
There is a set of jumpers for each device that can be put in the USB scan chain to configure whether it is in the scan chain or bypassed. How the jumpers need to be configured for each of them is labelled on the board. It is important that each device is configured either in bypass or connected mode as otherwise the scan chain might be interrupted and will not work.
The current default configuration is to have only the FPGA in the scan chain. See below in the FPGA and BMC sections of why that is.
CPU
The CPU has an additional jumper to select which test access controller (TAP) is selected on the chip. There are three options:
- CPU: We think this is for boundary scan
- DAP: Arm debug access port
- USB: We think this is for boundary scan of the on-chip USB controller
These jumpers control to which TAP the test reset signal (TRST) from the dedicated header is wired. There are pull-down resistors on the board that keep the TAPs in reset. To pull them out of reset the JTAG controller needs to pull TRST high. The TAPs on the CPU are not wired to form a scan chain put just share the signale, so only one of them can be active at any given time.
Debug access
We are currently investigating how to enable bare metal debugging on Enzian via the CPUs JTAG port. We can currently do this on the machine in our lab with the following steps:
- Set the CPU's TRST select jumper to the DAP position
- Configure the scan chain to bypass the CPU
- Connect the hardware debugger to the CPU's dedicated JTAG header
- In Arm-DS create a debug connection using ThunderX-r2 AP0 when selecting the target
- Use the debugger
We have failed in the past to get this to work on other machines. It remains to be tested if that was due to incorrect setup or e.g. electrical issues with other machines (a candidate could be high capacitance and therefore residual voltage on the 3.3V rail that provides the reference voltage for the JTAG header).
We are currently investigating how to get debug access to the CPU with the on-board JTAG controller.
FPGA
Currently the FPGA on Enzian is programmed through its JTAG port which is why the FPGA needs to be on the on-board scan chain. The FPGA's TAP runs at 1.8V while the rest of the board's scan chain runs on 3.3V. The level shifter between the FPGA's TAP and the rest of the scan chain uses a non-standby rail for its 1.8V reference voltage. This means that the scan chain is interrupted while this voltage rail is off and JTAG doesn't work for any devices on the chain without powering the rail.
BMC
In addition to the basic JTAG signals that are needed to communicate with the TAP the BMC is also connected to the system reset signal (SRST) of the on-board JTAG controller. This connection is hard-wired and not controlled by the bypass jumpers for the BMC. This means that the BMC can be soft-reset by briefly pulling SRST low even if it is bypassed on the on-board scan chain or the scan chain is interrupted. This can e.g. be done with JTAG sequences in the Xilinx System Debugger (XSDB). There is a script that does that.
Some Xilinx tools are known to add reset breakpoints to all processor targets they see on a hardware server. So connecting one of these tools to the hardware server on enzian-gateway will add these breakpoints to all BMCs that are visible on their scan chain. With the above issue with the FPGA's level shifter this can lead to the BMC getting stuck at boot: Rebooting the BMC will turn off all non-standby rails, therefore interrupting the scan chain making it impossible to clear the reset breakpoint. Resetting the BMC through SRST is not enough to clear the breakpoint. The only way to recover the machine is to power cycle it. This is why we chose to not have the BMC on the scan chain by default.