Efficient sharing of FPGA resources in oneAPI: BittWare’s Crossbar White Paper
Efficient Sharing of FPGA Resources in oneAPI Building a Butterfly Crossbar Switch to Solve Resource Sharing in FPGAs The Shared Resource Problem FPGA cards usually
Eideticom’s Query Processing Unit (QPU) targets database users or anyone streaming data (such as network packets) who need query, analytics or format conversion done in hardware with low latency. Tasks can be parallelized to meet any bandwidth needs, plus mixed with other NoLoad® functions like compression.
As part of Eideticom’s NoLoad computational storage framework, the QPU can be pipelined with other Computational Storage IP from Eideticom such as Compression, Decompression Erasure Coding and Deduplication.
Relieving the burden of hardware engineers being involved in specifying the QPU parameters, users define the functions using an on-chip processor. This gives both high-level tool ease-of-use and offload from the main host CPU.
The QPU includes features for format conversion (text to/from binary) or standards-driven database functions. Users design their mix of functions in software—no hardware-based tools are necessary. The QPU will support native acceleration for a range of database tools as these packages move to support standards-based computational storage.
Perform data queries with software-defined search and filter parameters.
Filter, pattern-match, and analytics for streaming data such network packet headers or data on SSD storage.
Efficient reformatting of text/binary data or other format conversions.
This is Sean Gibb, vice-president of engineering at Eideticom. In this video, we will demonstrate the use of Eideticom’s Query Processing Unit, or QPU, for formatting and filtering stock-ticker data stored in a comma-separated text format.
The embedded processors in Eideticom’s QPU are software programmable using C or C++, allowing you to dynamically program your filtering functions.
In addition to the embedded processors, easy to use, high-throughput, hardware co-processors (that perform common tasks like packet capture analysis, conversion from text to binary formats, and simple filtering) are available to your embedded software to accelerate your query workloads.
In this example, we use the text-to-binary formatter to convert CSV to binary data, perform a runtime-configurable hardware filter (to filter out specific stock symbols and low-volume trades), and then perform a software filter to remove all trades where the day closes lower than it opens.
We compile the software using a GCC compiler to produce an executable that we can load through our software stack into the embedded processors. Once the software’s loaded, we run 5GB of CSV data through the Query Engine, filtering for all Microsoft stocks with a volume that exceeds 10 million.
You can see here that a single Query Engine is capable of sustaining 2GB/s of text input. We can tile down multiple Query Engines and, thanks to Eideticom’s software stack, saturate the PCIe interface to the FPGA card with the same host software.
This is just one example of what Eideticom’s software-programmable, hardware-accelerated QPU can do for you.
The Query Processing Unit is defined in software that runs on the FPGA (using soft or hard processor), eliminating the need for low-level configuration engineering resources.
The QPU is modular, allowing for one or more units to be placed to meet a certain bandwidth requirements. QPU instances can cooperate, for example with a distributed file conversion over eight QPUs where data spanning two units needs to be coordinated.
The Query Processing Unit is a component of the NoLoad framework. The components such as Compression in orange are where users build their particular application using a software-defined approach.
Components like Compression can be added to the QPU to, for example, compress filtered data before moving to SSD storage.
All the accelerator functions shown are implemented in FPGA hardware, allowing for high bandwidth, low latency and CPU offload.
This real-world example uses the Query Processing Unit as a packet processing machine, plus the Compression Engine (another NoLoad® IP core). Packets are compressed and written to an SSD array using peer-to-peer transfers, while the QPU also pulls off header (network tuple) data plus some analytics such as packet count per time period. Analytics are sent to the host as CSV data.
Compared to a multi-threaded Intel Xeon CPU, the database Query Processing Unit performs as shown below.
Avg. Packet Size | CPU | 1× Query Engine (QE) | 2× QE | 4× QE |
---|---|---|---|---|
256B | 0.2 GB/s | 1.8 GB/s | 3.6 GB/s | 7.2 GB/s |
1024B | 0.7 GB/s | 2.0 GB/s | 4.0 GB/s | 8.0 GB/s |
4096B | 1.9 GB/s | 2.0 GB/s | 4.0 GB/s | 8.0 GB/s |
9216B | 2.7 GB/s | 2.0 GB/s | 4.0 GB/s | 8.0 GB/s |
In a database acceleration configuration, the Query Processing Unit can perform a range of functions from CPU offload to data type format conversions.
The Query Processing Unit targets BittWare’s cards with Intel Agilex FPGAs.
Our technical sales team is ready to provide availability and configuration information, or answer your technical questions.
"*" indicates required fields
Efficient Sharing of FPGA Resources in oneAPI Building a Butterfly Crossbar Switch to Solve Resource Sharing in FPGAs The Shared Resource Problem FPGA cards usually
White Paper Comparing FPGA RTL to HLS C/C++ using a Networking Example Overview Most FPGA programmers believe that high-level tools always emit larger bitstreams as
PCIe FPGA Card XUP-P3R AMD UltraScale+ 3/4-Length PCIe Board 4x 100GbE and up to 512GB DDR4 Need a Price Quote? Jump to Pricing Form Ready
BittWare On-Demand Webinar Computational Storage Using Intel® Agilex™ FPGAs: Bringing Acceleration Closer to Data Watch Now on Demand! Accelerating NVMe storage means moving computation, such