XUP-P3R with advanced passive heatsink

White Paper

Introduction to BittWare's Loopback App Note and Example


BittWare’s Loopback example demonstrates several things:

  • How to fully use the Xilinx CMAC in a design. This includes setting serdes transmit pre-emphasis values based upon DAC cable length. It also includes configuring the optional AN/LT functionality and processing interrupts received from active QSFP transceivers.
  • How to use Xilinx HLS/C++ to configure a packet processing pipeline. BittWare recommends HLS/C++ instead of using RTL or P4 for packet processing.
  • How to use Python in conjunction with the BittWare toolkit to manipulate a PCIe board’s control plane. In addition, how to use Python Scapy on a remote host to test a packet-oriented bitstream (https://scapy.net/).
  • How to implement global statistical register snapshots to assist debug efforts.


The functionality of the Loopback is not the principal focus of this example. Our focus was to demonstrate all of the bulleted items listed above. However, the Loopback has value. BittWare uses it to validate DAC cable settings when connected to third-party devices like NIC cards and switches.

The Loopback contains an L2 filter that selects frames to process. If those frames contain IPv4 packets, the Loopback swaps source and destination addresses at both the MAC and IP layers. The Loopback can respond to ARP packets. This was added to eliminate any requirement for specialized configuration of third-party devices.

The Loopback operates on a single QSFP cage, looping packets from input to output. Any additional QSFP cages are not used.

Loopback block diagram


This Looback was designed and tested on a BittWare XUP-P3R board containing a Xilinx VU9P chip, speed grade 2. The Loopback does not use any external memory and should port into any BittWare Xilinx UltraScale+ chip containing a CMAC.

FPGA Bitstream Overview

The Loopback’s FPGA bitstream contains several components. Each component has an AXI4-Stream interface on both input and output collectively used as a data plane. The bitstream’s control plane uses AXI4-Lite interfaces connected to the physical PCIe interface.

Design Flow

The Loopback is supplied as a Xilinx IP Integrator Project. Several of the components are written in Verilog. Three are written using the Xilinx HLS flow that emits Verilog.

The current implementation groups the three components written with HLS into a single component from the perspective of IP Integrator. However, those three components are separately documented here. In fact they are documented as four distinct components. This is because the HLS components share a common “Parser Library” which we document separately to avoid repetition.

Empty heading

AXI4 flow through BittWare IP
AXI4 flow through BittWare IP

Reset and Statistics

Philosophically, at reset, all components initialize enabled, but in a mode that does the “least harm”. Software must then configure the components before the Loopback begins successfully operating.

Each component also exposes statistical registers to assist users to debug hardware or software. We include a snapshot signal so that all of the statistical register values are synchronized in time.

AXI Interfaces

The bitstream’s interface widths and clock speed were selected to host 100 Gigabit Ethernet traffic. The data plane’s AXI4-Stream interface is 512 bits wide. Except where it touches the CMAC, the interface is clocked at 300 MHz. Frame metadata travels on a separate bus, the AXI TUSER bits, with valid data when AXI TLAST is high.

Metadata is not consistent across the bitstream. Thus the documentation associated with each of the components describes the metadata that component expects on input and the metadata it forwards on output.

Note: In this Loopback implementation, metadata follows frame data. This is opposite to the design decision made by the NetFPGA.org academic team. They optimized for building routers where it is helpful to know each frame’s destination in advance. In contrast, the Loopback was optimized around bitstream size and latency. Trailing metadata requires less buffering within each component.

The bitstream control interfaces are AXI4-Lite slaves, 32-bits wide. All reads and writes are 32-bits. In cases where byte order matters, the Loopback expects our control registers to hold data in network or big-endian byte order.

We document the component control plane registers in a single place, separately from the descriptions of the components themselves. Cross references exist to help users navigate between the two locations. The memory map used for the Loopback’s control registers is heavily influenced by the requirements of the AXI4-Lite interface implementation in the Xilinx HLS tool chain.

The formal definition of AXI used is from the Xilinx “AXI Reference Guide” available here: https://www.xilinx.com/support/documentation/ip_documentation/ug761_axi_reference_guide.pdf

Control Plane Software Overview

The BittWare Loopback Example runs on a PCIe card inserted into a host computer. BittWare supplies software for that host computer to control the example’s functionality. The control software uses Python 3 running on the host.

The Example’s software builds above BittWare’s BittWorks II Toolkit. More specifically, it adds Python bindings to the BwHIL and BmcLib libraries. It then leverages those bindings inside a collection of Python components created to manipulate registers that the Example’s bitstream exposes in PCIe address space.

In addition, the Loopback Example bitstream translates some hardware events into PCIe interrupts. To support this, the Loopback’s software translates those interrupts into Python calls.

To illustrate with a very basic interaction with the Loopback Example bitstream:

$ # First map the PCIe card using the Toolkit's command line or use the GUI
                $ bwconfig --add=usb  # mapped over USB first as device 0
                $ bwconfig --add=pci  # same card mapped over PCIe as device 1
                $ python3 # Invoke python3
                >>> from components.hildev import *
                >>> card = Card(1)
                >>> card.show_stats() # Shows all stats from all components
                >>> # Show stats from just the first CMAC component with some options
                >>> card.cmac[0].show_stats(showall=False, doTick=False)
                >>> help()
                >>> exit()

All of the Python components support a common collection of low-level methods. Note that our Python implementation does not have the PCIe memory map hard-coded. Instead Python reads a JSON database that defines the available FPGA bitstream components, their registers, and where the registers are located in PCIe address space. That JSON database is automatically generated from the Loopback Example’s documentation.

The full Python API documentation is available on BittWare’s Developer Site.

The low-level methods include:

  • list_regs()
  • find_reg(pattern)
  • list_fields()
  • find_field(pattern)
  • read32(), write32()
  • read64(), write64()
  • reg_get(), reg_set()
  • maton64(macaddr), n64toma(val64)

The higher level methods available depend upon the specific component. However, a few methods are relatively common:

  • enable()
  • show()
  • show_stats()

Interested in Details on Loopback?

We have more details on the Loopback available as a free App Note download; get it today through the form below!

Get the App Note

Instant PDF Download

What you see on this page is the introduction to the Loopback example. There’s a lot more detail in the full App Note, and best of all it’s FREE to download! Fill in the form to request access to a PDF version of the full App Note.

"*" indicates required fields

Please check this is an active email as the PDF will be sent using this address.
Address and City*