REFERENCE DESIGN MAU Accelerator for AI Financial Trading Models Ultra-low Latency, High Throughput Machine Learning Inference Well suited to a range of applications in financial
Unified Programming Model Across Multiple Architectures
Supporting high-level software tool flows is critical to a growing customer base who want to take advantage of heterogenous architectures. The new oneAPI from Intel is designed around code re-use while providing similar performance to other high-level tools. Learn about Intel’s oneAPI programming model and how it is solving programming challenges by enabling easier development for acceleration across multiple architectures.
For Multiple Architectures
In today’s HPC landscape, several hardware architectures are available for running workloads – CPUs, GPUs, FPGAs, and specialized accelerators. No single architecture is best for every workload, so using a mix of architectures leads to the best performance across the most scenarios. However, this architecture diversity leads to some challenges:
Each architecture requires separate programming models and toolchains:
- Required training and licensing – compiler, IDE, debugger, analytics/monitoring tool, deployment tool – per architecture
- Challenging to debug, monitor, and maintain cross-architectural source code
- Difficult integration across proprietary IPs and architectures and no code re-use
Software development complexity limits freedom of architectural choice.
- Isolated investments required for technical expertise to overcome the barrier-to-entry
How oneAPI Can Help
OneAPI delivers a unified programming model that simplifies development across diverse architectures. With the oneAPI programming model, developers can target different hardware platforms with the same language and libraries and can develop and optimize code on different platforms using the same set of debug and performance analysis tools – for instance, get run-time data across their host and accelerators through the Vtune profiler.
Using the same language across platforms and hardware architectures makes source code easier to re-use; even if platform specific optimization is still required when code is moved to a different hardware architecture, no code translation is required anymore. And using a common language and set of tools results in faster training for new developers, faster debug and higher productivity.
- Performance tuning and timing closure through emulation and reports
- Runtime analysis via VTune™ Profiler
- Complex hardware patterns implemented through built-in language features: macros, pragmas, headers
- Code re-use across architectures and vendors
- Compatible with existing high-performance languages
- Leverage familiar sequential programming languages: improved ramp-up and debug time
- IDE Integration: Eclipse, VS, VS Code
2D FFT Demo Using oneAPI
Develop faster + reuse code in this software-orientated tool flow
Explore using oneAPI with our 2D FFT demo no the 520N-MX card featuring HBM2. Be sure to request the code download at the bottom of the page!
Data Parallel C++
Standards-Based Cross-Architecture Language
The oneAPI language is Data Parallel C++, a high-level language designed for parallel programming productivity and based on the C++ language for broad compatibility. DPC++ is not a proprietary language; its development is driven by an open cross-industry initiative.
Language to deliver uncompromised parallel programming productivity and performance across CPUs and accelerators:
- Allows code reuse across hardware targets, while permitting custom tuning for a specific accelerator
- Open, cross-industry alternative to single architecture proprietary language
Based on C++:
- Delivers C++ productivity benefits, using common and familiar C and C++ constructs
- Incorporates SYCL* from the Khronos Group to support data parallelism and heterogeneous programming
Community Project to drive language enhancements:
- Extensions to simplify data parallel programming
- Open and cooperative development for continued evolution
FPGA Development Flow for oneAPI
One of the main problems when compiling code for FPGA is compile time – the backend compile process required for translating DPC++ code into a timing closed FPGA design implementing the hardware architecture specified by that code can take hours to complete. So, the FPGA development flow has been tailored to minimize full compile runs.
- The first step is functional validation, where code is checked for correctness using a test bench. This is made using emulation on the development platform – where the code targeting the FPGA is compiled and executed on CPU. That allows for a much faster turnaround time when a bug is found and needs to be fixed. A standard CPU debugger (such as the Intel® Distribution for GDB) can be used for that purpose.
- Once functional validation is completed, static performance analysis is performed through compiler generated reports. Reports include all the information required for identifying memory, performance, data-flow bottlenecks in the design, as well as suggestions for optimization techniques to resolve the bottlenecks. They also provide area and timing estimates of the designs for the target FPGA.
- After the results of static analysis are satisfactory, a full compile takes place. The compiler can insert on request profiling logic into the generated hardware; profiling logic generates dynamic profiling data for memory and pipe accesses that can later be used by the Vtune performance analyzer for identifying data pattern dependent bottlenecks that cannot be spotted in any other way.
Learn More About oneAPI
What is oneAPI?
oneAPI is a cross-industry, open, standards-based unified programming model that delivers a common developer experience across accelerator architectures—for faster application performance, more productivity, and greater innovation. The oneAPI industry initiative encourages collaboration on the oneAPI specification and compatible oneAPI implementations across the ecosystem.
oneAPI provides libraries for compute and data intensive domains. They include deep learning, scientific computing, video analytics, and media processing.
The Hardware Abstraction Layer
The oneAPI specification extends existing developer programming models to enable a diverse set of hardware through language, a set of library APIs, and a low level hardware interface to support cross-architecture programming. To promote compatibility and enable developer productivity and innovation, the oneAPI specification builds upon industry standards and provides an open, cross-platform developer stack.
You can get started right away using oneAPI on the 520N-MX
Get Started with oneAPI on an FPGA
You’ll need three components to start developing with oneAPI. The oneAPI Base Toolkit and FPGA ADd-On are both available from Intel. The BSP for your BittWare FPGA card is available on BittWare’s developer site.
Intel’s oneAPI Base Toolkit
Intel FPGA Add-On for oneAPI Base Toolkit
FPGA card BSP from BittWare
Craig Petrie talks about BittWare's support for oneAPI
Interested in Pricing or More Information?
Our technical sales team is ready to provide availability and configuration information, or answer your technical questions.