What is Enzian?

Enzian is a research platform. It combines a powerful 48-core ARMv8 Cavium ThunderX CPU with a potent Xilinx Ultrascale+ 9P FPGA. These 2 chips are connected together with a hi-speed (21 GiB/s) low latency (115 ns) link. The link uses a cache coherency protocol - Enzian Coherent Interconnect - ECI.

Compiled Enzian user guide and quickstart guide to boot an Enzian machine: GitLab repo 🔗

Working with Enzians

Sources and documentation

The source code you can find here: https://gitlab.inf.ethz.ch/project-openenzian

We use git as a revision control system: https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control

The repositories make use of git submodules: https://git-scm.com/book/en/v2/Git-Tools-Submodules

Machines

There are 14 Enzian machines, named zuestoll01-zuestoll14, accessible from enzian-gateway server. Each machine is accessible through consoles (4 per every machine: BMC console, CPU console, FPGA console and 2nd CPU serial port) using console program. To acquire/release a machine, use call emg program on enzian-gateway server. The board power and the CPU is controlledUsing the BMC console The deafult Enzian BMC user & password are "root" and "0penBmc". The default Enzian Linux user & password are "enzian" and "enzian".

There is the Hardware Manager server running on enzian-gateway as well, providing JTAG connections to the machines, so FPGAs can be programmed and debugged. The BMC console is used to control powering of the CPU and the FPGA.

If a project uses FPGA's DDR4s, make sure that the MIGs are cailbrated ("CAL PASS"). If calibration failed, it means that the memory configuration of the machine doesn't correspond to the expected memory configuration of the project.

The list of the machines, their hostnames, JTAG IDs and their configuration

Cluster Info

The link is brought up during the CPU boot process:

  1. The FPGA is programmed
  2. In case of the softeci - the Microblaze program has to be loaded and running
  3. The CPU starts after reset
  4. BDK (Bring-up and Diagnostic Kit) will bring the link up

When the link is brought up, the CPU console will show it:

Starting ECI links
CDR lock QLM8:1 QLM9:1 QLM10:1 QLM11:1 QLM12:1 QLM13:1
N0.CCPI Lanes([] is good):[0][1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23]

Linux errors

  • Uncorrected, Read from a disabled CCPI - the link is not up
  • Uncorrected, LFB entry timeout - the CPU didn't receive a response from the FPGA, either it was not sent or it was wrong
  • Synchronous error, SError, most likely it was caused by a wrong ECI message sent by the FPGA

These are fatal errors, normally they should never occur. They mean that there is a problem with the design. The FPGA won't work reliably so it has to be reprogrammed.

Isolating CPU cores in Linux

You can isolate cores i.e.

The Checklist

  • the FPGA implemented design (the bitstream) doesn't have excessive negative slacks (check the timing summary report, *_timing_summary.rpt file) nor critical warnings
  • the chosen machine configuration (memory modules, PCIe/NVMe devices, SATA drives) matches the bitstream
  • CPU and FPGA are powered up
  • all of the FPGA expected modules are working correctly: MIGs are calibrated, PCIe/NVMe/SATA/100G devices have their link up
  • the ECI link is brought up

Internal architecture

From the CPU point of view, the FPGA is just another CPU, with its memory and registers. Both, the memory and the registers, reside in its own physical address spaces and have different means of access. The register space, I/O, is accessible as bytes, 2-byte, 4-byte or 8-byte words. This access method is not very fast (about 0.5 GiB/s, ~230ns latency), up to two transactions per core. It's used as a configuration space. The memory space is cache coherent, data is transferred as cache lines, 128 bytes. The ECI protocol is capable of transferring parts of a cache line, 4 sub cache lines, each 32-byte long.

The FPGA memory space is seen on the FPGA as ECI frames on the interface to the shell. The ECI gateway module is used to process ECI frames and extract ECI packets and route them to the specific receiver, based on the VC number, ECI message type and cache line index. The ECI gateway also handles transmission of ECI packets.

The FPGA I/O space is seen as AXI transactions on the AXI lite buses to the shell.

CPU physical address map

  • 0x0000_0000_0000 - 0x00FF_FFFF_FFFF - CPU Memory (1TiB)
  • 0x0100_0000_0000 - 0x01FF_FFFF_FFFF - FPGA Memory (ECI interface in the FPGA application) (1TiB) - coherent, ECI requests have to be handled by the application i.e. Directory Controller
  • 0x8000_0000_0000 - 0x8FFF_FFFF_FFFF - CPU I/O (16TiB)
  • 0x9000_0000_0000 - 0x9FFF_FFFF_FFFF - FPGA I/O (AXI lite interface in the FPGA application) (16TiB) - non-coherent, ECI requests are converted by the shell to AXI requests

Detailed CPU I/O space address map

  • 0x000_0000_0000 - 0x7DF_FFFF_FFFF - NCB (Near-Coprocessor Bus)
  • 0x7E0_0000_0000 - 0x7E0_FFFF_FFFF - RSL
  • 0x7F0_0000_0000 - 0x7FF_FFFF_FFFF - AP
  • 0x800_0000_0000 - 0x8FF_FFFF_FFFF - SLI (Switch Logic Interface)

Linux applications

The FPGA I/O space can be accessed using /dev/mem  device, the FPGA memory space can be accessed using /dev/mem, or using /dev/fpgamem device to achieve full performance.

Linux Kernel Driver

https://gitlab.inf.ethz.ch/project-openenzian/enzian-software/linux-memory-driver

This driver maps the FPGA memory using huge pages and correct page mapping attributes (type is set to memory). The driver also provides access to privileged ThunderX L2$ instructions (like flush or writeback and invalidate).

The driver uses Transparent Huge Pages on kernels 5.10 and newer. On older kernels like 5.4 THP doesn't work properly, use this commit. ARM64 architecture currently only supports 2MiB and doesn't support 1GiB THP (the kernel has to be patched).

The repo contains as well the memory benchmarking program and the example memory access program.

UEFI

If for some reason the UEFI settings/boot configuration gets corrupted, call varclr command in the UEFI shell. It will reset all settings.

Building an FPGA project

The FPGA physical layout is split into two regions:

  • static: the shell - it's built once, stays the same and is used with all applications
  • reconfigurable: the application

The Shell

https://gitlab.inf.ethz.ch/project-openenzian/fpga-stack/static-shell

The Shell provides low-level ECI interface, AXI lite interfaces and control lines.

The Shell is built once, either using build_static_shell_stub.tcl script, or using build_static_shell_stub_opt.tcl script to get a better implemented version, but it takes more time. When the shell is built, the ENZIAN_SHELL_DIR environment variable should point to the Shell build directory containing static_shell_routed.dcp, used to link with applications.

You don't have to build the shell yourself, you can download the artifacts from here: https://gitlab.inf.ethz.ch/project-openenzian/fpga-stack/static-shell/-/releases

Just clone the repo and put the artifacts in build subdirectory.

Shell/stub bitstream

The shell builds a bitstream that is used for basic testing:

  • handles GlobalSync messages
  • provides phony memory to test reading/writing to the FPGA memory space
  • implements two fully working cache lines, the content of the 2nd cache line is copied from the 1st cache line, they used to measure cache-to-cache latency
  • implements 8kB RAM memory accessible through the FPGA I/O space, 0x9000_0000_0000 - 0x9000_0000_1FFF 

ECI Transport

https://gitlab.inf.ethz.ch/project-openenzian/fpga-stack/eci-transport

Provides low-level support, used by the Shell.

ECI Toolkit

https://gitlab.inf.ethz.ch/project-openenzian/fpga-stack/eci-toolkit

Provides basic modules, used by the Shell and applications.

Example Stub application

https://gitlab.inf.ethz.ch/project-openenzian/fpga-stack/dynamic-stub

The stub application provides a memory loopback (all reads/writes to the FPGA memory, to test the ECI troughput/latency), a BRAM memory accessible through the FPGA I/O space and 2 fully working cache lines used to test cache-to-cache transfers.

Directory Controller Slice

https://gitlab.inf.ethz.ch/project-openenzian/fpga-stack/directory-controller-slice

Module to handle remote memory accesses (from the CPU to the FPGA) coherently.

Sample Application

https://gitlab.inf.ethz.ch/project-openenzian/fpga-stack/sample-application

Sample application connects Directory Controller Slices to two memory channels (expecting two 16GB modules, 1st and 4th channel)

Sample Top Level

https://gitlab.inf.ethz.ch/project-openenzian/fpga-stack/sample-top-level

A combined project, the shell and the sample application.

CI/CD

BDK, ATF and UEFI are build by a gitlab runner and artifacts are placed in /srv/tftp/enzian  on enzian-gateway server.

Common problems

Stuck console

If the console seems to be not reacting to keys and only prints, try putting the console down and reopening it, ^Ecd and then ^Eco

Failed MIG calibration

Each MIG is configured for a specific memory type. This message means that the hardware configuration doesn't match the MIG configuration.

Errors because of incomplete sources

Most of the projects use submodules, remember to check them out as well, or by addding --recurse-submodules to git clone, or by calling submodule update --init --recursive after cloning.

Vitis 2023.2 looks differently

Launch "Vitis Classic 2023.2" or add –classic option when launching vitis from the command line.

  • No labels