Why use FPGA for NVMe-oF?
You basically have three choices when you decide to offload NVMe-OF into a PCIe card:
First, you can use an ASIC implementation, which would be the lowest cost and lowest latency choice. However ASICs don’t allow you to also offload “computational storage” algorithms. ASICs are also generally only available for the most popular network bandwidths, which are rarely the highest bandwidths.
Second, you could use a system-on-a-chip with a high core count, which would allow you to add “computational storage” algorithms. However, doing so requires parallel programming skills. The ultimate solution is generally the highest latency of the choices here which works directly against NVMe’s low-latency value proposition. Like ASICs, these MPP SOCs are generally only available for the most popular network bandwidths which are rarely the highest bandwidths.
Third, you can use an FPGA. This option allows you to add “computational storage” algorithms while maintaining ASIC-like latency. This option also enables high-bandwidth networks such as 100 or even 400Gb. Although it may be the most expensive of the three options, the cost difference becomes only slightly higher when you consider the volumes involved in storage markets.
Adaptive Storage
By leveraging technologies like FPGAs and SoCs, datacenter architects can further reduce data movement to/from the CPU for data-intensive operations. With hardware-driven acceleration, user applications show higher performance and lower response times. As the number of free CPU cycles increases, the processes distributing workloads take advantage of the hybrid system architecture using dedicated hardware and the CPU more efficiently. The FPGA fabric architecture, their IO throughput, and their programming flexibility facilitate the design of reconfigurable hardware tightly coupled with high-bandwidth NVMe storage. FPGAs are particularly suitable for compression, encryption, RAID and erasure code, data deduplication, key-value offload, database query offload, video processing, or NVMe virtualization, for example. FPGA hardware offers the performance of a dedicated solution but also has the advantage of being reconfigurable quickly switching purpose as the datacenter needs change over time.
Using a Xilinx MPSoC for NVMe-oF Target
The BittWare 250-SoC features a Xilinx UltraScale+ Zynq ZU19EG MPSoC and can connect to both the network fabric through two QSFP28 ports and the PCIe fabric through a 16-lane host interface or four 8-lane OCuLink connectors. This MPSoC adaptor is a perfect platform to drive an NVMe-oF target node as it combines data stream compute in the FPGA fabric (also called, PL, or Programmable Logic), network IO, PCIe connectivity, and an onboard ARM processor. Note that the ARM is not in the data plane; it handles control plane work. Placing a dedicated hardware accelerator between the CPU and storage endpoints creates a system optimized to compute closer to the data.