Real-time AI Acceleration
A deterministic ML inference chip that's designed for ultra-low latency applications.
Fully deterministic processor
Predictable and repeatable performance with no run-to-run variation
230 MB of on-die memory
Up to 80 TBs on-die memory bandwidth
Massive concurrency and data
parallelism for bandwidth sensitive applications
9 RealScale™ chip-to-chip connectors
Near-linear multi-server and multi-rack scalability without the need for external switches
End-to-end on-chip protection
Improves uptime and reliability with error-correction code (ECC) protection throughout the entire GroqChip™ data path
PCIe Gen4 x16 interface
Up to 31.5GB/s of bi-directional bandwidth in an industry standard interface for fast device and network connections
Ready to Get Started?
BittWare offers several paths to get Groq AI inference, whether you are in development or ready for deployment:
Ready, set, done. Guaranteed low latency.
For machine learning inference, GPUs suffer from inefficiencies leading to latency, low silicon resource usage, and unpredictable performance. Groq has designed its AI deep learning chip specifically to provide predictable, efficient, low-latency inference that’s easy to bring into your current workflow.
Ease of Integration and Built to Scale
The GroqCard is a double-width PCIe form factor ML accelerator that’s hassle-free to integrate. The GroqWare suite follows a software-defined hardware approach, giving easy deployment paths for your PyTorch, TensorFlow, and ONNX-trained deep learning models.
Scalability is a core feature of the GroqCard, with 9 RealScale chip-to-chip connections that ensure deployment of multiple cards is as efficient as one. An internal software defined network provides predictable, repeatable performance with no run-to-run variations.
Simplify programming with
GroqWare Suite is a comprehensive and versatile software stack designed to accelerate a variety of HPC and ML workloads. Composed of Groq™ Compiler, Groq API, and Utilities, the suite eases deployment implementations with an open source driver/runtime and support for industry standard AI/ML frameworks.
GroqFlow™ Tool Chain (included in the GroqWare Suite) enables a single line of Pytorch or TensorFlow code to import and transform existing models through a fully automated tool chain to run on Groq hardware.
See your program as it travels across GroqChip
What is GroqChip™ Processor?
A scalable processor built for the ground up to accelerate AI, ML, and HPC workloads.
The revolutionary, fully deterministic GroqChip processor is the core of scalable performance. Built from the ground up to accelerate AI, ML, and HPC workloads, GroqChip reduces data movement for predictable low-latency performance, bottleneck-free. This standalone chip provides flexible integration into compute intensive applications.
The architecture is much simpler than a GPU and is designed with a software-first focus, making it easier to program and providing predictable performance with lower latency.
Science & Government
Oil & Gas
PCIe Card Specifications
Dual width, full height, ¾ length PCI Express Gen4 x16 adapter
Up to 750 TOPs, 188 TFLOPs (INT8, FP16 @900 MHz)
230 MB SRAM per chip
Up to 80 TB/s on-die memory bandwidth
Up to 9 RealScale™ chip-to-chip connectors
INT8, INT16, INT32 & TruePoint™ technology MXM: FP32
VXM: FP16, FP32
Max: 375W; TDP: 275 ; Typical: 240W
Ready for More Info?
Fill out the form to get in touch for details on Groq Real-time AI Acceleration.