Developing IP to run inside an FPGA traditionally uses languages originally created for ASIC development. There isn’t consensus regarding what to call these languages. Sometimes people use “RTL” which stands for register transfer language. Other people use “HDL” which stands for hardware definition language. The two most popular HDLs are VHDL and SystemVerilog. Internally BittWare uses both but our larger projects use SystemVerilog and it full verification feature set.
Verification is an important part of the FPGA design process and is a critical part of ASIC design. Rising costs in the ASIC industry drove the need for advanced verification languages and techniques to ensure first pass silicon success. This need drove the Verilog language to incorporate features of HVLs (High level Verification Languages), eventually merging into the current SystemVerilog IEEE 1800 standard. Modern ASIC verification has also moved toward a Universal Verification Methodology (UVM) that provides a standard way of building testbenches.
The different economics of FPGA development as well as the ability to immediately test designs in the lab have slowed the adoption of UVM among FPGA developers. Nevertheless, the growing complexity of high-density FPGAs has pushed many groups to adopt the same verification methodology that is used in ASIC design flows. At BittWare, we have a fluid approach to FPGA verification. We often use simple testbenches based on the SystemVerilog or VHDL features available in inexpensive or free simulators. When appropriate, however, we build modern testbenches based on the full feature set of SystemVerilog and UVM.
HLS, or High Level Synthesis, is the name given to languages which operate at a higher level of abstraction than HDL. In practice, HLS usually refers to specialized versions of C or C++. However, there are other HLS languages. For example, some third-party IP that BittWare distributes is written in BlueSpec. All of these HLS tools tend to come with an easy, push-button way to generate a testbench at the module-level. UVM is still needed at the system level.
Finally, at the highest level today, is OpenCL. It is a parallel programming language developed for GPU chips and repurposed into the FPGA world. Today OpenCL’s application is almost entirely HPC, or high-performance computing, where it is used to implement computational algorithms that run faster than an Intel-based server can run them.
The HLS approach to FPGA development is to only abstract portions of the application that can be easily expressed in a C/C++ environment. The HLS tool flow is available for essentially any BittWare board through the use of Vivado (Xilinx) or Intel (Quartus) tools.
To succeed with HLS, it’s important to recognize the portions of your application that will be a good fit. Guidelines include:
The current limitations of HLS are clearly that its scope is limited to IP blocks. The application team would still require RTL for other components, although leveraging something like BittWare’s SmartNIC Shell for the RTL parts, a user may be able to define their unique application entirely in HLS. It should also be noted that HLS is a poor choice for the most simplistic of codes or larger designs that consist mostly of pre-optimized components.
What is RSS? RSS stands for “Receiver Side Scaling.” It is a hashing algorithm to efficiently distribute network packets across multiple CPUs. RSS is a feature on modern Ethernet cards, and generally implements the specific Toeplitz hash defined by Microsoft.
The environment that hosts our RSS application is BittWare’s SmartNIC Shell. The SmartNIC Shell is designed to give users head start when building an FPGA-based networking application. It provides users an optimized FPGA-based 100G Ethernet pipeline, including DPDK for host interaction. All the user needs to do is drop in their application as an IP block.
In this case, BittWare was the user as well, having created as our application an FPGA implementation of RSS. The team creating RSS using the traditional RTL approach and the HLS team both were able to use SmartNIC Shell as their FPGA Ethernet framework and concentrate on the RSS application itself.
Our FPGA-based RSS implementation is specifically based on C code found in the DPDK source tree, with the test function for that code also available in the tree. Our RSS application also uses a 64-entry indirection table instead of the more common 128-entry table. What is important for this HLS study is that the function we are moving into the FPGA starts off defined in C. That meets our number one criterion for HLS success—a definition in C or C++.
The goal of the the RSS function is to distribute packets among CPUs, keeping related streams of packets together. Different Toeplitz key sets provide different distribution patterns. However, no matter the key set, our RSS function uses each packet’s source and destination IP address and source and destination ports as input. These four components combined are called a 4-tuple.
Note that for our RSS application we are assuming the 4-tuple was already parsed and added to the packet’s metadata. Another SmartNIC Shell module handles this packet classification function. We call that module our “parser” and will be the subject of a separate BittWare white paper.
Our RSS implementation currently accepts a 96-bit field for classification—enough for the 4-tuple of IPv4 source/destination and port. The parser provides zero for fields not available in the packet; if a packet does not include any IP payload, the full 96-bit tuple field is zero.
Many RSS implementations use a 5-tuple instead of a 4-tuple. Doing that would require an additional 8-bits to accommodate the protocol number. HLS users of RSS can easily accommodate that change with minor source code changes. This ability to quickly adapt from 4-tuple to 5-tuple is an example of the number two criteria for HLS success – a requirement for multiple rounds of implementation.