for Intel FPGA Software Development
BSPs for our Arria 10 and Stratix 10 FPGA cards supporting the Intel OpenCL SDK
Using OpenCL FPGA development is perfect for teams with little or no knowledge of FPGA development. It’s also a solution for any team that requires faster turnaround than a traditional HDL workflow can provide. OpenCL on BittWare FPGA cards brings a larger developer pool to take advantage of the advanced hardware our products offer.
BittWare provides a range of Intel FPGA-based cards featuring Arria 10 and Stratix 10 devices that support the Intel OpenCL SDK though a BittWare-tuned board support package (BSP).
Wondering if your existing CPU- or GPU-based application would benefit from FPGA acceleration? We can perform a benchmark to estimate potential performance improvements. Ask about our Application Optimization Services to get started.
Tool Flow Flexibility
For Software or Hardware-Based Development
- OpenCL support for software-orientated customers
- Abstration for faster development
- Push-button flow for FPGA executable, driver, and API
- Add optimized HDL IP cores to OpenCL designs as libraries
What is OpenCL?
The OpenCL Software Language
The Open Computing Language (OpenCL) standard is the first open, royalty-free, unified programming model for accelerating algorithms on heterogeneous systems. OpenCL software allows the use of a C-based programming language for developing code across different platforms such as central processing units (CPUs), graphic processing units (GPUs), digital signal processors (DSPs), and field-programmable gate arrays (FPGAs).
The OpenCL industry standard enables engineering teams to target FPGA technology-based products without getting to the level of details that hardware and firmware engineers programming in HDL had to. Existing CPU/GPU C or OpenCL code can be recompiled with the Intel OpenCL Software Development Kit and instantly make use of the FPGA hardware resources.
When porting existing code or developing new algorithms, OpenCL software is to the new standard to reduce time to market for FPGA–based accelerator products.
FPGA Programming with OpenCL
OpenCL allows the programmer to construct a dedicated FPGA Accelerator by performing hardware level optimizations automatically in the OpenCL code. The key FPGA features and benefits are abstracted in the syntax and the programmer uses the compiler to create highly parallel applications. The reconfigurable FPGA logic allows the generation of dedicated and optimized block for hardware dedicated functionalities.
Historically FPGA have been used as integer arithmetic accelerators, the Arria 10 FPGA family now also features higher FLOPS with dedicated floating-point resources (up to 1.5 TFLOPS), which OpenCL software leverages seamlessly allowing an entire new range of application to benefit from FPGAs.
Previous generations of FPGA accelerators have been limited by their IO throughput or memory bandwidth, OpenCL Software Development Kit helps balancing the high computing power capabilities of the FPGA logic with the speed of IOs, enabling high speed kernel-to-kernel and kernel-to-IOs data transfers through the OpenCL channel extension.
The channel feature combined with the highly flexible memory configuration, where internal and on-board memory can be customized to fit the application’s need in a way different from GPUs, provide the platform to enable BittWare FPGA accelerator as optimized stream computing nodes in customers’ infrastructures.
OpenCL Software Development Kit enables:
- Thousands of parallel kernels executions
- Configurable FPGA logic optimized for integer arithmetic
- New dedicated floating-point FPGA resources (up to 1.5 TFLOPS)
- Configurable local and global memory
- Kernel-to-kernel / kernel-to-IO high bandwidth channels
- Low Power
Intel Tool Flow
The Intel OpenCL SDK is a development environment for the Software Programmer; FPGA design considerations are abstracted away and automatically handled by the compiler. The flow is based on a debug and optimization cycle in software where the FPGA compilation is to be performed only a limited number of times when most of the application has been designed and optimized.
- Emulator to verify functionalities
- Optimize OpenCL for FPGA architecture—over 300 optimizations
a. Increase parallelism
b. Ensure pipeline
c. Use FPGA hardware resources
- Profile kernel performance
- Compile to FPGA hardware target
The Intel SDK for OpenCL is in full production release enabling programmers to get to gate-level performant OpenCL code by following simple design guidelines and port kernel code platform to platform with minimum effort. OpenCL SDK is the most efficient path to production and deployment for FPGA Accelerator solutions.
HDL vs. OpenCL Performance Comparison
CERN published results of a study comparing two algorithms programmed in both HDL and OpenCL on a BittWare 385 board.
Faster Development: 2.5 months vs. 2 weeks
Easier Development: 3,400 lines vs. 250 lines
Similar Performance: 35x vs. 26-30x acceleration
CERN noted the advantages even for HDL-capable teams using OpenCL isntead in that smaller code base is easier to update. FPGA logic/DSP resource usage was comparable on both approaches as well.
Source: Reconstruction, Trigger, and Machine Learning for the HL-LHC Workshop at MIT “FPGAs as co-processors for reconstruction” Slide 19.
Board Support Packages
What is a BSP?
BittWare’s expertise in FPGA-based hardware and algorithm acceleration is concentrated in the OpenCL Board Support Packages. The on-board resources and the FPGA low-level resources are automatically leveraged by the BSPs allowing the programmer to focus on the algorithm rather than its physical implementation in the FPGA.
BittWare BSP offerings are tailored to specific needs. For COMPUTE intensive applications, HPC BSPs maximize the FPGA’s resource utilization. For data streaming acceleration, the NETWORK Streams enabled MAC BSPs provide a data flow straight to the FPGA fabric for in-stream bit operations.
Intel’s OpenCL SDK combined with BittWare’s BSP enables the use of the newly available OpenCL channel feature. Channels are an OpenCL construct that allows kernel-to-kernel or IO-to-kernel high bandwidth data transfers. The high bandwidth FPGA fabric local memory bandwidth can be leveraged by these OpenCL channels.
Fully Integrated Solutions
BittWare OpenCL capable FPGA Accelerators are available as a fully integrated & production-ready solutions. The BSP can be installed and deployed from a single installer on the development and runtime systems. BittWare also offers the BSP Debug Kits, which include the Intel Quartus-II / OpenCL SDK licenses for customers who require them.
BittWare OpenCL BSPs also include several features to facilitate in production system deployment:
- Board health status (power consumption & temperature)
- Intel PCIe Hard IP cores (tested across industry standard systems)
- Flash recovery mechanisms
We also provide pre-installed, ready to use, Integrated Servers with all the software & hardware pieces included.
The High Performance Computing BSP, or HPC BSP, provides the larger amount of FPGA resources to the user algorithm.
Use the OpenCL SDK features to maximize the FPGA fabric utilization by replicating multiple parallel instances of your optimized OpenCL kernel code.
High Bandwidth Kernel-to-Kernel Channel Support
|Typical Applications:||Resource Usage:|
BittWare can also develop customized Board Support Packages for your specific needs. Multiple I/O protocols are supported by BittWare FPGA Accelerators. Our team of FPGA acceleration experts can work with your organization to develop a customized Board Support Package.
The BittWare FPGA Accelerators compatible with OpenCL are based on three FPGA families: Stratix 10, Arria 10, and Stratix V. When choosing an FPGA Accelerator that will fit their system’s requirements, customers must first look into the FPGA resources requirements of their algorithm at the top level and the FPGA Accelerator capabilities.
BittWare offers multiple FPGA Accelerators and Board Support Packages to target these needs.The following sections describe BittWare BSP IP offering.
|Metric||385||395||385A||510T||520N (L-Tile) (**)||520N (H-Tile) (**)||520N-MX (H-Tile)|
|Host to Global Memory Bandwidth||8-lane PCIe 2.0||8-lane PCIe 2.0||8-lane PCIe 3.0||16-lane PCIe 3.0 (2x 8 lanes)||16-lane PCIe 3.0 (*)||16-lane PCIe 3.0 (*)||16-lane PCIe 3.0 (*)|
|Global Memory Depth||2x 4GB||4x 8GB (*)||Up to 2x 4GB||Up to 4x 4GB (*)||4x 8GB||4x 8GB||HBM2 (*)|
|IO Channels||Network: 2x 10GbE MAC (MAC BSP)||Network: 4x 10GbE MAC (MAC BSP)||Board-to-board: 2x 40Gbps serial links or Network: 2x 10 GbE MAC (MAC BSP)||Board-to-board: 2x 40Gbps serial links||Board-to-board: 4x 40Gbps serial links||Board-to-board: 4x 100Gbps serial links||Board-to-board: 4x 100Gbps serial links (via QSFP28s)|
(*) Inquire for availability and details
(**) Also compatible with 520C, but no board-to-board IO channel support
Our OpenCL Experience
Experience and Innovation
BittWare’s partnership with Intel on supporting OpenCL SDK is the logical continuation of 20 years of experience promoting high level language programming of FPGAs. Understanding customer’s system challenges and identifying the best approach to accelerate and optimize a customer’s application is in BittWare’s DNA.
BittWare believes FPGA-based products should be:
- Intuitive & Easy to Use
- Production Ready
- Easy to deploy
- Integrated in Customer’s System
We believe OpenCL serves these goals and we are excited to see you succeed!
A Team of System Acceleration Experts
At BittWare we have assembled a top-notch design and engineering team that can engage with you to ensure your program’s success. We work best when we are deeply engaged with customers at the early development stage, leveraging our multiple disciplines to deliver a solution on-time, on-budget and on-spec with minimum risk.
BittWare Design Services Key Value:
- Reduced Risk
- Lower Costs
- Faster time to Market
BittWare’s R&D Department is constantly providing new solutions to the industry’s challenges.
BittWare provides pre-compiled packages for all the example designs available on the Intel Examples Pages.