Anemone104™ Co-Processor for FPGAs [AN104]

Low-Power, C-Programmable, Floating Point

Looking for additional product information?

We would love to hear from you. Please fill out this form and we will get in touch with you shortly.
View Datasheet

Manual Request Form

If you would like additional product information, including a product manual, please fill in the below required fields. We will get back to you shortly. Thank you for your interest in BittWare.
Request Manual
  • 16 independent floating point cores
  • 19.2 GFLOPS of floating point processing
  • 1 Watt core power
  • ANSI C-programmable
  • IEEE Floating Point
  • Shared memory architecture
  • External I/O via memory-mapped links
  • Scale multiple chips up to 4.9 TFLOPS
  • High throughput mesh network
  • Standard GNU/Eclipse Development Tools
  • Available from BittWare on standard board formats

Overview

A New Approach to Floating Point DSP

Traditional floating point DSPs, while excellent at complex processing tasks, have limitations when it comes to chip real estate and power efficiency that have caused them to become an endangered species. And FPGAs, while superior for versatility and configurability, can be difficult to use for complex and evolving applications. The BittWare Anemone, featuring the Epiphany architecture from Adapteva, enables the best assets of both to be combined, thereby offering a completely new approach to floating point digital signal processing. This hybrid solution provides a standard processor software development environment working in conjunction with a world-class FPGA platform, allowing users to optimally partition their algorithms into hardware and software. The result is superior development productivity and unmatched system size, weight, and power.

Focus on power & efficiency

Anemone is a truly C-programmable floating point compute engine. It is unique in that it achieves superior power efficiency and processing performance because it is designed to work alongside an FPGA as a co-processor. The FPGA handles all the memory, I/O interfacing, protocol processing, and special functions in addition to any computational tasks it may perform, leaving the Anemone free to efficiently perform the complex processing tasks that DSPs are ideal for. This allows Anemone to be an extremely efficient chip – as compared with traditional floating point DSPs that may only use 5% of the silicon area for processing.

Simple, elegantly designed floating point cores

The Anemone is a completely scalable, up to 600 MHz multicore processor with 16 eCores that provide a total sustained performance of 19.2 GFLOPS while consuming only 1 Watt of core power. Each eCore features a compact, general-purpose instruction set that requires no instruction level parallelism and provides high program efficiency. All floating point computations are performed as single-precision IEEE 754; hardware looping is also supported. Anemone offers distributed and segmented memory, and large uniform register files. On-chip distributed shared memory is 4 Mb (32 KByte per eCore) with 19.2 GBytes/sec of sustained memory bandwidth within each eCore. The cache-less shared memory architecture is extended off-chip via I/O links.

High-throughput eMesh network

The Anemone features an internal high-throughput mesh network, with separate data paths for on-chip and off-chip communications. Each eCore has a multi-channel DMA engine to support background data movement over the mesh. Total on-chip, inter-core bandwidth is 76.8 GBytes/sec full duplex, with an additional 4.8 GBytes/sec of off-chip bandwidth. Each router node can simultaneously sustain full-duplex transfers on all ports, with automatic routing based on global addressing.

I/O via memory-mapped high-speed links

The Anemone provides a flexible low-overhead external interconnect scheme that supports memory-mapped direct connection of multiple Anemones and is compatible with any LVDS capable FPGA. This is achieved via four links that are full-duplex 8-bit LVDS data ports @ 300 MHz DDR, each simultaneously providing 600 MByte/sec in each direction for a total off-chip bandwidth of 4.8 GBytes/sec. Its FPGA co-processor use model provides the ultimate flexibility: since all external I/O goes through an FPGA, system designers can customize the I/O to their application’s specific requirements.

Anemone Development Tools

The Anemone reduces system development cost by enabling out-of-the-box execution of applications written in regular ANSI-C. It does not require any C-subset, language extensions, or SIMD. BittWare offers a software development kit and libraries to support development for the Anemone co-processor.

Anemone Development Kit

BittWare’s Anemone Development Kit (ADK) provides tools for software development on BittWare’s Anemone-based hardware. The ADK includes the Adapteva Epiphany Software Development Kit (SDK), which is based on standard GNU development tools: optimized GNU C compiler w/ binutils, simulator, standard GNU GDB debugger, and Eclipse multi-core IDE. In addition to the Epiphany SDK, the ADK also includes a command line development environment, platform library, and FIR and DMA examples for the Anemone co-processor.

Insight™ Libraries and Profiler for Anemone

Insight™ (in partnership with Paralant, Ltd.) is a set of optimized numerical signal processing functions for the Anemone floating point co-processor. It includes the most commonly used signal processing algorithms: FFTs, FIR filters, matrix multiplication (real and complex), and many others. Insight also includes a profiler, enabling developers to visually analyze the Anemone FPGA co-processor’s performance.

Available Hardware Platforms

The Anemone is available from BittWare on standard board form factors, including FMC (VITA 57), AdvancedMC (AMC), VPX (VITA 46/48/65), and PCI Express (PCIe) slot card. Development boards and systems are also available.

Anemone Development Platforms

Anemone Development Platforms are available for evaluating the Anemone104 co-processor, as well as designing and debugging applications for the Anemone104.

Anemone Development System

The BittWare Anemone Development System provides the tools you need to design and debug applications for the Anemone co-processor for FPGAs. The development system includes a VITA-57 FMC with four Anemone processors, a BittWare VPX or AMC carrier board based on the Altera Stratix family of FPGAs, and a VPX or AMC rapid development platform. The system also includes all necessary development tools: the BittWorks II Toolkit for the BittWare hardware; BittWare’s ATLANTiS FrameWork for the Altera FPGAs; and the BittWare Anemone Development Kit, which includes Adapteva’s Epiphany SDK. The Insight™ libraries for Anemone104 can also be included as an ordering option.

Anemone Evaluation Kit

The BittWare Anemone Evaluation Kit provides a cost-effective way to begin evaluating the Anemone co-processor for FPGAs. The evaluation kit features an Anemone104 and Altera Stratix III FPGA based evaluation board and arrives ready to use out of the box, with the Ubuntu Linux OS and Anemone development tools installed on an included laptop. Anemone development tools include the Adapteva Epiphany™ Software Development Kit (SDK); the Insight™ libraries for Anemone104 can also be included as an ordering option.

Specs

Performance/Power

      • 19.2 GFLOPS
      • 1 Watt core power

Epiphany eCore

Anemone has 16 independent eCores, each with:

FPU: Floating-Point Unit

      • 2 FLOP (1 MAC) per cycle
      • 1.2 GFLOPS @ 600 MHz
      • Single precision IEEE 754 floating point
      • Shared memory multiprocessor architecture
      • C-friendly instruction set
      • Hardware looping

Register File

      • Flat, uniform file of 64 entries
      • Single load/store model

Network Interface & DMA

      • Shared memory-mapped, transparent to eCore
      • Full duplex 4.8 GBytes/sec bandwidth per eCore
      • 2 DMA channels per eCore @ 600 MHz each
      • Supports background I/O

IALU: Integer Arithmetic Unit

      • Address generation

Segmented Memory

      • 32 KBytes SRAM per eCore (512 KB total; 4 banks of 8 KByte)
      • 19.2 GBytes/sec memory bandwidth per eCore
      • Cache-less
      • Shared memory architecture

Epiphany eMesh Network

3 independent, full-duplex mesh networks:

Networks

      • cMesh for core writes (on-chip): 4.8 GBytes/sec each direction, per segment; total cross-sectional badwidth of 115.2 GBytes/sec
      • xMesh for external writes (off-chip): 600 MByte/sec each direction, per segment
      • rMesh for read requests: 600 MByte/sec each direction, per segment

Transparently Shared Memory

      • Routing operates independently of cores
      • Coordinate based routing
      • Extends off-chip, supporting up to 4096 eCores in a single 2D mesh

Clock

      • 600 MHz full clock rate

I/O and Debug Ports

External I/O Link Ports

      • 4.8 GBytes/sec off-chip bandwidth
      • 4 memory-mapped link ports
      • Full-duplex 8-bit LVDS @ 300 MHz DDR

SPI Port

      • Allows debug, command, & control
      • Access to every core, register, & memory

Development Tools

BittWare Anemone Software Development Kit

      • Installer script for Ubuntu 10.04
      • Anemone platform library
      • Command line environment
      • FIR and DMA examples
      • Adapteva Epiphany SDK

Adapteva Epiphany SDK

      • Optimized GNU C compiler w/ binutils
      • Simulator
      • Standard GNU GDB debugger
      • Eclipse multi-core IDE

Insight™ Libraries and Profiler for Anemone

      • Optimized numerical signal processing functions
      • BLAS library
      • FFT library
      • Performance profiler

Available Board Form Factors

      • FMC (VITA 57) FPGA Mezzanine Card
      • VPX (VITA 46/48/65)
      • AMC (AdvancedMC) for MicroTCA & ATCA
      • PCI Express (PCIe) slot card

Block Diagram

AN104 Chip Block Diagram

AN104 eCore Block Diagram

Quad Anemone™ Floating Point Co-Processor VITA-57 FMC Block Diagram

Supported HW

Ordering

For pricing information on BittWare’s Anemone104 products, please contact BittWare.

Compatible Products