What is Enzian?
Enzian is a research platform. It combines a powerful 48-core ARMv8 Cavium ThunderX CPU with a potent Xilinx Ultrascale+ 9P FPGA. These 2 chips are connected together with a hi-speed (21 GiB/s) low latency (115 ns) link. The link uses a cache coherency protocol - Enzian Coherent Interconnect - ECI.
Compiled Enzian user guide: GitLab repo đź”—
Working with Enzians
Sources and documentation
The source code you can find here: https://gitlab.inf.ethz.ch/project-openenzian
We use git as a revision control system: https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control
The repositories make use of git submodules: https://git-scm.com/book/en/v2/Git-Tools-Submodules
Step-by-step guide to boot an Enzian
Quickstart guide in PDF: GitLab repo đź”—
Machines
There are 14 Enzian machines, named zuestoll01-zuestoll14, accessible from enzian-gateway server. Each machine is accessible through consoles (4 per every machine: BMC console, CPU console, FPGA console and 2nd CPU serial port) using console program. To acquire/release a machine, use call emg program on enzian-gateway server. The board power and the CPU is controlledUsing the BMC console The deafult Enzian BMC user & password are "root" and "0penBmc". The default Enzian Linux user & password are "enzian" and "enzian".
There is the Hardware Manager server running on enzian-gateway as well, providing JTAG connections to the machines, so FPGAs can be programmed and debugged. The BMC console is used to control powering of the CPU and the FPGA.
If a project uses FPGA's DDR4s, make sure that the MIGs are cailbrated ("CAL PASS"). If calibration failed, it means that the memory configuration of the machine doesn't correspond to the expected memory configuration of the project.
The list of the machines, their hostnames, JTAG IDs and their configuration
The ECI link
The link is brought up during the CPU boot process:
- The FPGA is programmed
- In case of the softeci - the Microblaze program has to be loaded and running
- The CPU starts after reset
- BDK (Bring-up and Diagnostic Kit) will bring the link up
When the link is brought up, the CPU console will show it:
Starting ECI links
CDR lock QLM8:1 QLM9:1 QLM10:1 QLM11:1 QLM12:1 QLM13:1
N0.CCPI Lanes([] is good):[0][1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23]
Linux errors
Uncorrected, Read from a disabled CCPI
- the link is not upUncorrected, LFB entry timeout
- the CPU didn't receive a response from the FPGA, either it was not sent or it was wrong- Synchronous error, SError, most likely it was caused by a wrong ECI message sent by the FPGA
These are fatal errors, normally they should never occur. They mean that there is a problem with the design. The FPGA won't work reliably so it has to be reprogrammed.
Isolating CPU cores in Linux
You can isolate cores i.e. make Linux not to schedule anything on a core(s), so you can use it exclusively for your application. To do that, add a parameter isolcpus
 to the kernel command line, for example:
isolcpus=47
 to isolate core 47. You will still receive scheduling clock ticks every 4ms that take around 11us each. To get rid of them, you have to have a fully tickless kernel (CONFIG_NO_HZ_FULL=y) and you have add an additional parameter nohz_full
, for example:
isolcpus=47 nohz_full=47
Â
The Checklist
- the FPGA implemented design (the bitstream) doesn't have excessive negative slacks (check the timing summary report, *_timing_summary.rpt file) nor critical warnings
- the chosen machine configuration (memory modules, PCIe/NVMe devices, SATA drives) matches the bitstream
- CPU and FPGA are powered up
- all of the FPGA expected modules are working correctly: MIGs are calibrated, PCIe/NVMe/SATA/100G devices have their link up
- the ECI link is brought up
Internal architecture
From the CPU point of view, the FPGA is just another CPU, with its memory and registers. Both, the memory and the registers, reside in its own physical address spaces and have different means of access. The register space, I/O, is accessible as bytes, 2-byte, 4-byte or 8-byte words. This access method is not very fast (about 0.5 GiB/s, ~230ns latency), up to two transactions per core. It's used as a configuration space. The memory space is cache coherent, data is transferred as cache lines, 128 bytes. The ECI protocol is capable of transferring parts of a cache line, 4 sub cache lines, each 32-byte long.
The FPGA memory space is seen on the FPGA as ECI frames on the interface to the shell. The ECI gateway module is used to process ECI frames and extract ECI packets and route them to the specific receiver, based on the VC number, ECI message type and cache line index. The ECI gateway also handles transmission of ECI packets.
The FPGA I/O space is seen as AXI transactions on the AXI lite buses to the shell.
CPU physical address map
- 0x0000_0000_0000 - 0x00FF_FFFF_FFFF - CPU Memory (1TiB)
- 0x0100_0000_0000 - 0x01FF_FFFF_FFFF - FPGA Memory (ECI interface in the FPGA application) (1TiB) - coherent, ECI requests have to be handled by the application i.e. Directory Controller
- 0x8000_0000_0000 - 0x8FFF_FFFF_FFFF - CPU I/O (16TiB)
- 0x9000_0000_0000 - 0x9FFF_FFFF_FFFF - FPGA I/O (AXI lite interface in the FPGA application) (16TiB) - non-coherent, ECI requests are converted by the shell to AXI requests
Detailed CPU I/O space address map
- 0x000_0000_0000 - 0x7DF_FFFF_FFFF - NCB (Near-Coprocessor Bus)
- 0x7E0_0000_0000 - 0x7E0_FFFF_FFFF - RSL
- 0x7F0_0000_0000 - 0x7FF_FFFF_FFFF - AP
- 0x800_0000_0000 - 0x8FF_FFFF_FFFF - SLI (Switch Logic Interface)
Linux applications
The FPGA I/O space can be accessed using /dev/mem
 device, the FPGA memory space can be accessed using /dev/mem
, or using /dev/fpgamem
device to achieve full performance.
Linux Kernel Driver
https://gitlab.inf.ethz.ch/project-openenzian/enzian-software/linux-memory-driver
This driver maps the FPGA memory using huge pages and correct page mapping attributes (type is set to memory). The driver also provides access to privileged ThunderX L2$ instructions (like flush or writeback and invalidate).
The driver uses Transparent Huge Pages on kernels 5.10 and newer. On older kernels like 5.4 THP doesn't work properly, use this commit. ARM64 architecture currently only supports 2MiB and doesn't support 1GiB THP (the kernel has to be patched).
The repo contains as well the memory benchmarking program and the example memory access program.
UEFI
If for some reason the UEFI settings/boot configuration gets corrupted, call varclr
command in the UEFI shell. It will reset all settings.
Building an FPGA project
The FPGA physical layout is split into two regions:
- static: the shell - it's built once, stays the same and is used with all applications
- reconfigurable: the application
The Shell
https://gitlab.inf.ethz.ch/project-openenzian/fpga-stack/static-shell
The Shell provides low-level ECI interface, AXI lite interfaces and control lines.
The Shell is built once, either using build_static_shell_stub.tcl script, or using build_static_shell_stub_opt.tcl script to get a better implemented version, but it takes more time. When the shell is built, the ENZIAN_SHELL_DIR environment variable should point to the Shell build directory containing static_shell_routed.dcp, used to link with applications.
You don't have to build the shell yourself, you can download the artifacts from here:Â https://gitlab.inf.ethz.ch/project-openenzian/fpga-stack/static-shell/-/releases
Just clone the repo and put the artifacts in build
subdirectory.
Shell/stub bitstream
The shell builds a bitstream that is used for basic testing:
- handles GlobalSync messages
- provides phony memory to test reading/writing to the FPGA memory space
- implements two fully working cache lines, the content of the 2nd cache line is copied from the 1st cache line, they used to measure cache-to-cache latency
- implements 8kB RAM memory accessible through the FPGA I/O space, 0x9000_0000_0000 - 0x9000_0000_1FFFÂ
Edge ILA
The shell has an ILA that can be used to capture ECI messages. The ILA is connected to both links, transmitting and receiving sides. *_has_data_payload can be used to trigger on any message, *_has_data_payload_filtered filters out GSYNC/GINV message.
ECI Transport
https://gitlab.inf.ethz.ch/project-openenzian/fpga-stack/eci-transport
Provides low-level support, used by the Shell.
ECI Toolkit
https://gitlab.inf.ethz.ch/project-openenzian/fpga-stack/eci-toolkit
Provides basic modules, used by the Shell and applications.
Example Stub application
https://gitlab.inf.ethz.ch/project-openenzian/fpga-stack/dynamic-stub
The stub application provides a memory loopback (all reads/writes to the FPGA memory, to test the ECI troughput/latency), a BRAM memory accessible through the FPGA I/O space and 2 fully working cache lines used to test cache-to-cache transfers.
Directory Controller Slice
https://gitlab.inf.ethz.ch/project-openenzian/fpga-stack/directory-controller-slice
Module to handle remote memory accesses (from the CPU to the FPGA) coherently.
Sample Application
https://gitlab.inf.ethz.ch/project-openenzian/fpga-stack/sample-application
Sample application connects Directory Controller Slices to two memory channels (expecting two 16GB modules, 1st and 4th channel)
Sample Top Level
https://gitlab.inf.ethz.ch/project-openenzian/fpga-stack/sample-top-level
A combined project, the shell and the sample application.
CI/CD
BDK, ATF and UEFI are build by a gitlab runner and artifacts are placed in /srv/tftp/enzian
 on enzian-gateway
server.
Common problems
Stuck console
If the console seems to be not reacting to keys and only prints, try putting the console down and reopening it, ^Ecd and then ^Eco
Failed MIG calibration
Each MIG is configured for a specific memory type. This message means that the hardware configuration doesn't match the MIG configuration.
Errors because of incomplete sources
Most of the projects use submodules, remember to check them out as well, or by addding --recurse-submodules
to git clone
, or by calling submodule update --init --recursive
after cloning.
Vitis 2023.2 looks differently
Launch "Vitis Classic 2023.2" or add –classic
option when launching vitis
 from the command line.