Intel® oneAPI

Unified Programming Model Across Multiple Architectures


Supporting high-level software tool flows is critical to a growing customer base who want to take advantage of heterogenous architectures. The new oneAPI from Intel is designed around code re-use while providing similar performance to other high-level tools. Learn about Intel’s oneAPI programming model and how it is solving programming challenges by enabling easier development for acceleration across multiple architectures.

Supported today on select BittWare products

Programming Challenges

For Multiple Architectures

In today’s HPC landscape, several hardware architectures are available for running workloads – CPUs, GPUs, FPGAs, and specialized accelerators. No single architecture is best for every workload, so using a mix of architectures leads to the best performance across the most scenarios. However, this architecture diversity leads to some challenges:

Each architecture requires separate programming models and toolchains:

  • Required training and licensing – compiler, IDE, debugger, analytics/monitoring tool, deployment tool – per architecture
  • Challenging to debug, monitor, and maintain cross-architectural source code
  • Difficult integration across proprietary IPs and architectures and no code re-use

Software development complexity limits freedom of architectural choice.

  • Isolated investments required for technical expertise to overcome the barrier-to-entry

How oneAPI Can Help

OneAPI delivers a unified programming model that simplifies development across diverse architectures. With the oneAPI programming model, developers can target different hardware platforms with the same language and libraries and can develop and optimize code on different platforms using the same set of debug and performance analysis tools – for instance, get run-time data across their host and accelerators through the Vtune profiler.

Using the same language across platforms and hardware architectures makes source code easier to re-use; even if platform specific optimization is still required when code is moved to a different hardware architecture, no code translation is required anymore. And using a common language and set of tools results in faster training for new developers, faster debug and higher productivity.

  • Performance tuning and timing closure through emulation and reports
  • Runtime analysis via VTune™ Profiler
  • Complex hardware patterns implemented through built-in language features: macros, pragmas, headers
  • Code re-use across architectures and vendors
  • Compatible with existing high-performance languages
  • Leverage familiar sequential programming languages: improved ramp-up and debug time
  • IDE Integration: Eclipse, VS, VS Code

2D FFT Demo Using oneAPI

Develop faster + reuse code in this software-orientated tool flow

Explore using oneAPI with our 2D FFT demo no the 520N-MX card featuring HBM2. Be sure to request the code download at the bottom of the page!

Data Parallel C++

Standards-Based Cross-Architecture Language

The oneAPI language is Data Parallel C++, a high-level language designed for parallel programming productivity and based on the C++ language for broad compatibility. DPC++ is not a proprietary language; its development is driven by an open cross-industry initiative.

Language to deliver uncompromised parallel programming productivity and performance across CPUs and accelerators:

  • Allows code reuse across hardware targets, while permitting custom tuning for a specific accelerator
  • Open, cross-industry alternative to single architecture proprietary language

Based on C++:

  • Delivers C++ productivity benefits, using common and familiar C and C++ constructs
  • Incorporates SYCL* from the Khronos Group to support data parallelism and heterogeneous programming

Community Project to drive language enhancements:

  • Extensions to simplify data parallel programming
  • Open and cooperative development for continued evolution 

FPGA Development Flow for oneAPI

One of the main problems when compiling code for FPGA is compile time – the backend compile process required for translating DPC++ code into a timing closed FPGA design implementing the hardware architecture specified by that code can take hours to complete. So, the FPGA development flow has been tailored to minimize full compile runs.

  1. The first step is functional validation, where code is checked for correctness using a test bench. This is made using emulation on the development platform – where the code targeting the FPGA is compiled and executed on CPU. That allows for a much faster turnaround time when a bug is found and needs to be fixed. A standard CPU debugger (such as the Intel® Distribution for GDB) can be used for that purpose.
  2. Once functional validation is completed, static performance analysis is performed through compiler generated reports. Reports include all the information required for identifying memory, performance, data-flow bottlenecks in the design, as well as suggestions for optimization techniques to resolve the bottlenecks. They also provide area and timing estimates of the designs for the target FPGA.
  3. After the results of static analysis are satisfactory, a full compile takes place. The compiler can insert on request profiling logic into the generated hardware; profiling logic generates dynamic profiling data for memory and pipe accesses that can later be used by the Vtune performance analyzer for identifying data pattern dependent bottlenecks that cannot be spotted in any other way.

FREE On-Demand Webinar

Using Intel® oneAPI™ to Achieve High-Performance Compute Acceleration with FPGAs

Watch immediately after registering!

Learn More About oneAPI

What is oneAPI?

oneAPI is a cross-industry, open, standards-based unified programming model that delivers a common developer experience across accelerator architectures—for faster application performance, more productivity, and greater innovation. The oneAPI industry initiative encourages collaboration on the oneAPI specification and compatible oneAPI implementations across the ecosystem.

The Libraries

oneAPI provides libraries for compute and data intensive domains. They include deep learning, scientific computing, video analytics, and media processing.

The Hardware Abstraction Layer

The low-level hardware interface defines a set of capabilities and services that allow a language runtime to utilize a hardware accelerator.

The Specification

The oneAPI specification extends existing developer programming models to enable a diverse set of hardware through language, a set of library APIs, and a low level hardware interface to support cross-architecture programming. To promote compatibility and enable developer productivity and innovation, the oneAPI specification builds upon industry standards and provides an open, cross-platform developer stack.

You can get started right away using oneAPI on the 520N-MX

Get Started with oneAPI on an FPGA

You’ll need three components to start developing with oneAPI. The oneAPI Base Toolkit and FPGA ADd-On are both available from Intel. The BSP for your BittWare FPGA card is available on BittWare’s developer site. 

Intel’s oneAPI Base Toolkit

Intel FPGA Add-On for oneAPI Base Toolkit

FPGA card BSP from BittWare


Craig Petrie talks about BittWare's support for oneAPI


Interested in Pricing or More Information?

Our technical sales team is ready to provide availability and configuration information, or answer your technical questions.