Anemone104™ Co-Processor for FPGAs [AN104]
- 16 independent floating point cores
- 19.2 GFLOPS of floating point processing
- 1 Watt core power
- ANSI C-programmable
- IEEE Floating Point
- Shared memory architecture
- External I/O via memory-mapped links
- Scale multiple chips up to 4.9 TFLOPS
- High throughput mesh network
- Standard GNU/Eclipse Development Tools
- Available from BittWare on standard board formats
A New Approach to Floating Point DSP
Traditional floating point DSPs, while excellent at complex processing tasks, have limitations when it comes to chip real estate and power efficiency that have caused them to become an endangered species. And FPGAs, while superior for versatility and configurability, can be difficult to use for complex and evolving applications. The BittWare Anemone, featuring the Epiphany architecture from Adapteva, enables the best assets of both to be combined, thereby offering a completely new approach to floating point digital signal processing. This hybrid solution provides a standard processor software development environment working in conjunction with a world-class FPGA platform, allowing users to optimally partition their algorithms into hardware and software. The result is superior development productivity and unmatched system size, weight, and power.
Focus on power & efficiency
Anemone is a truly C-programmable floating point compute engine. It is unique in that it achieves superior power efficiency and processing performance because it is designed to work alongside an FPGA as a co-processor. The FPGA handles all the memory, I/O interfacing, protocol processing, and special functions in addition to any computational tasks it may perform, leaving the Anemone free to efficiently perform the complex processing tasks that DSPs are ideal for. This allows Anemone to be an extremely efficient chip – as compared with traditional floating point DSPs that may only use 5% of the silicon area for processing.
Simple, elegantly designed floating point cores
The Anemone is a completely scalable, up to 600 MHz multicore processor with 16 eCores that provide a total sustained performance of 19.2 GFLOPS while consuming only 1 Watt of core power. Each eCore features a compact, general-purpose instruction set that requires no instruction level parallelism and provides high program efficiency. All floating point computations are performed as single-precision IEEE 754; hardware looping is also supported. Anemone offers distributed and segmented memory, and large uniform register files. On-chip distributed shared memory is 4 Mb (32 KByte per eCore) with 19.2 GBytes/sec of sustained memory bandwidth within each eCore. The cache-less shared memory architecture is extended off-chip via I/O links.
High-throughput eMesh network
The Anemone features an internal high-throughput mesh network, with separate data paths for on-chip and off-chip communications. Each eCore has a multi-channel DMA engine to support background data movement over the mesh. Total on-chip, inter-core bandwidth is 76.8 GBytes/sec full duplex, with an additional 4.8 GBytes/sec of off-chip bandwidth. Each router node can simultaneously sustain full-duplex transfers on all ports, with automatic routing based on global addressing.
I/O via memory-mapped high-speed links
The Anemone provides a flexible low-overhead external interconnect scheme that supports memory-mapped direct connection of multiple Anemones and is compatible with any LVDS capable FPGA. This is achieved via four links that are full-duplex 8-bit LVDS data ports @ 300 MHz DDR, each simultaneously providing 600 MByte/sec in each direction for a total off-chip bandwidth of 4.8 GBytes/sec. Its FPGA co-processor use model provides the ultimate flexibility: since all external I/O goes through an FPGA, system designers can customize the I/O to their application’s specific requirements.
Anemone Development Tools
The Anemone reduces system development cost by enabling out-of-the-box execution of applications written in regular ANSI-C. It does not require any C-subset, language extensions, or SIMD. BittWare offers a software development kit and libraries to support development for the Anemone co-processor.
Anemone Development Kit
BittWare’s Anemone Development Kit (ADK) provides tools for software development on BittWare’s Anemone-based hardware. The ADK includes the Adapteva Epiphany Software Development Kit (SDK), which is based on standard GNU development tools: optimized GNU C compiler w/ binutils, simulator, standard GNU GDB debugger, and Eclipse multi-core IDE. In addition to the Epiphany SDK, the ADK also includes a command line development environment, platform library, and FIR and DMA examples for the Anemone co-processor.
Insight™ Libraries and Profiler for Anemone
Insight™ (in partnership with Paralant, Ltd.) is a set of optimized numerical signal processing functions for the Anemone floating point co-processor. It includes the most commonly used signal processing algorithms: FFTs, FIR filters, matrix multiplication (real and complex), and many others. Insight also includes a profiler, enabling developers to visually analyze the Anemone FPGA co-processor’s performance.
Available Hardware Platforms
The Anemone is available from BittWare on standard board form factors, including FMC (VITA 57), AdvancedMC (AMC), VPX (VITA 46/48/65), and PCI Express (PCIe) slot card. Development boards and systems are also available.
Anemone Development Platforms
Anemone Development Platforms are available for evaluating the Anemone104 co-processor, as well as designing and debugging applications for the Anemone104.
Anemone Development System
The BittWare Anemone Development System provides the tools you need to design and debug applications for the Anemone co-processor for FPGAs. The development system includes a VITA-57 FMC with four Anemone processors, a BittWare VPX or AMC carrier board based on the Altera Stratix family of FPGAs, and a VPX or AMC rapid development platform. The system also includes all necessary development tools: the BittWorks II Toolkit for the BittWare hardware; BittWare’s ATLANTiS FrameWork for the Altera FPGAs; and the BittWare Anemone Development Kit, which includes Adapteva’s Epiphany SDK. The Insight™ libraries for Anemone104 can also be included as an ordering option.
Anemone Evaluation Kit
The BittWare Anemone Evaluation Kit provides a cost-effective way to begin evaluating the Anemone co-processor for FPGAs. The evaluation kit features an Anemone104 and Altera Stratix III FPGA based evaluation board and arrives ready to use out of the box, with the Ubuntu Linux OS and Anemone development tools installed on an included laptop. Anemone development tools include the Adapteva Epiphany™ Software Development Kit (SDK); the Insight™ libraries for Anemone104 can also be included as an ordering option.
- 19.2 GFLOPS
- 1 Watt core power
Anemone has 16 independent eCores, each with:
FPU: Floating-Point Unit
- 2 FLOP (1 MAC) per cycle
- 1.2 GFLOPS @ 600 MHz
- Single precision IEEE 754 floating point
- Shared memory multiprocessor architecture
- C-friendly instruction set
- Hardware looping
- Flat, uniform file of 64 entries
- Single load/store model
Network Interface & DMA
- Shared memory-mapped, transparent to eCore
- Full duplex 4.8 GBytes/sec bandwidth per eCore
- 2 DMA channels per eCore @ 600 MHz each
- Supports background I/O
IALU: Integer Arithmetic Unit
- Address generation
- 32 KBytes SRAM per eCore (512 KB total; 4 banks of 8 KByte)
- 19.2 GBytes/sec memory bandwidth per eCore
- Shared memory architecture
Epiphany eMesh Network
3 independent, full-duplex mesh networks:
- cMesh for core writes (on-chip): 4.8 GBytes/sec each direction, per segment; total cross-sectional badwidth of 115.2 GBytes/sec
- xMesh for external writes (off-chip): 600 MByte/sec each direction, per segment
- rMesh for read requests: 600 MByte/sec each direction, per segment
Transparently Shared Memory
- Routing operates independently of cores
- Coordinate based routing
- Extends off-chip, supporting up to 4096 eCores in a single 2D mesh
- 600 MHz full clock rate
I/O and Debug Ports
External I/O Link Ports
- 4.8 GBytes/sec off-chip bandwidth
- 4 memory-mapped link ports
- Full-duplex 8-bit LVDS @ 300 MHz DDR
- Allows debug, command, & control
- Access to every core, register, & memory
BittWare Anemone Software Development Kit
- Installer script for Ubuntu 10.04
- Anemone platform library
- Command line environment
- FIR and DMA examples
- Adapteva Epiphany SDK
Adapteva Epiphany SDK
- Optimized GNU C compiler w/ binutils
- Standard GNU GDB debugger
- Eclipse multi-core IDE
Insight™ Libraries and Profiler for Anemone
- Optimized numerical signal processing functions
- BLAS library
- FFT library
- Performance profiler
Available Board Form Factors
- FMC (VITA 57) FPGA Mezzanine Card
- VPX (VITA 46/48/65)
- AMC (AdvancedMC) for MicroTCA & ATCA
- PCI Express (PCIe) slot card
AN104 Chip Block Diagram
AN104 eCore Block Diagram
Quad Anemone™ Floating Point Co-Processor VITA-57 FMC Block Diagram
For pricing information on BittWare’s Anemone104 products, please contact BittWare.