520NX PCIe Card with Intel Stratix 10 NX FPGA
PCIe FPGA Card 520NX Stratix 10 AI-optimized FPGA with HBM2 AI-Optimized for High-Bandwidth, Low-Latency AI Acceleration Overview Designed to tackle the most demanding artificial intelligence
With the high-speed capabilities of our FPGA products, timestamping can be a challenge. BittWare provides a range of solutions including dedicated add-on modules such as the A10PL4 timing kit (pictured), coaxial inputs on the base card and support for IP such as Atomic Rules TimeServo.
Many BittWare FPGA cards provide for two coaxial inputs. One input is intended for a 1 PPS time synchronization signal. The other is for a 10 MHz reference clock. This combination makes it possible to attach very accurate timestamps to input data. BittWare also offers an example implementation of 100 GbE Ethernet packet timestamps controlled by IEEE-1588 PTP protocol. BittWare’s SmartNIC Shell reference design uses a time servo inside the FPGA that we licensed from Atomic Rules.
The two coaxial inputs are all card users need for many timestamping system configurations. The paragraphs that follow detail what we offer and how accurate it can be.
Xilinx’s Ethernet MACs can apply timestamps. However, the Xilinx IP expects users to provide an accurate timestamp which the MAC then associates with each packet that flows through. The creation and maintenance of accurate timestamps is the responsibility of user supplied IP.
Simplistically, implementing this can be as simple as providing a counter that increments by 4 nanoseconds attached to a 250 MHz clock. This implementation will be as accurate as the crystal on the board. It will be hard to synchronize with time-of-day. One can improve both accuracy and synchronization by allowing an external 1 PPS signal to reset some of the counter’s least significant bits every second). However, if your counter happens to run faster than real-world time, doing that causes timestamp to flow backwards. Backwards timestamps are known to break some popular application software; therefore, the better solution is to somehow use the 1 PPS signal to slow down or speed up the FPGA’s timestamp clock. That is what a time servo does.
A time servo implementation can use external hardware. The Precision Timing Module option for some of BittWare’s low profile cards contains a chip-based time servo. However, BittWare generally recommends users consider the Atomic Rules Time Servo IP that goes inside the FPGA.
The functionality of all time servos, hardware or software based, is defined by the time servo APIs associated with the Linux kernel. The Linux kernel has a user-level API for manipulating time servos. The API is explained at Kernel.org. Users can set the time, shift the time by a set offset, or tweak the clock rate up or down. These are the basic functions any time servo should support. The Linux implementation of PTP uses this clock API. Unfortunately, BittWare’s SmartNIC reference design does not have an Ethernet driver and thus cannot enable this Linux protocol. Please use the Linux API as documentation of the functionality expected in any time servo implementation.
BittWare’s SmartNIC supports DPDK. DPDK has a user level API for manipulating time servos. Look for the functions rte_eth_timesync_adjust_time, rte_eth_timesync_read_time and rte_eth_timesync_write_time. Atomic Rules offers an implementation of this DPDK API for its time servo. However users must also acquire the Atomic Rules Arkville PCIe DMA block in order to use it. This DPDK API is fully implemented in BittWare’s SmartNIC reference design for customers willing to license both Atomic Rules blocks. Finally, Atomic Rules has a complete implementation of PTP that can run inside the FPGA, without any need for Ethernet or DPDK on the host.
We have seen users dedicate an Ethernet port on the host computer for PTP and then attempt to synchronize that time with the FPGA board. This works if you also have a 1 PPS signal into the FPGA card. Otherwise, doing this results in very inaccurate values. The problem is that PTP is adjusting a timestamp clock inside the NIC ASIC. PTP is not directly adjusting the motherboard clock. Without that 1 PPS, there is no methodology that can put all three clocks (NIC, motherboard, FPGA) into close synchronization. For timestamp accuracy without 1 PPS, the PTP packets need to flow through the FPGA card. Thus, users need one of the FPGA’s QSFP network ports to timestamp PTP packets flowing to any PTP implementation (host via Ethernet driver, host via DPDK driver, or the Atomic Rules PTP running inside the FPGA).
The 10 MHz input found on many BittWare cards allow users to supply a timing reference signal of whatever quality their applications needs. Without that signal, FPGA cards from BittWare use a commodity crystal oscillator with roughly 20-30 ppm of potential error (combines stability, jitter, and tolerance errors). Thus, the FPGA board’s time stamp clock can be off by maybe 20 microseconds, up or down, every second. If left uncorrected by an external signal (1 PPS or IEEE 1588) the errors can add up to 1.7 seconds per day. This means users should connect an external 10 MHz timing reference if hold over (loss of PPS signal) is important to your application.
That 20-30 ppm of potential error is associated with wide changes in temperature and voltage that won’t be experienced in a computer room. A good implementation of PTP time synchronization protocols will hide the crystal’s tolerance and its longer-term error components, leaving us mostly concerned with the crystal’s short-term jitter. Some vendors claim that a good PTP implementation in a computer room, without a supplemental 1 PPS, with a commodity crystal, can achieve +/- 1 microsecond synchronization to the grandmaster. Others are skeptical this degree of accuracy is achievable without more specialized hardware.
Whatever the actual accuracy of that combination, it is likely good enough to meet the only legal accuracy requirement that we are aware of. The ESMA legal requirement in Europe for timestamping financial transactions triggered by automated trades is to UTC with an uncertainty no greater than 100 microseconds.
If your application requires synchronization with time-of-day, most users leverage atomic clocks offered in the GPS satellite constellation. Each satellite contains a rubidium atomic clock accurate to better than half a nanosecond per year. However, there is lots of circuitry and atmosphere between that clock and our FPGA board. Because of that, over a period of hours, GPS time signals are accurate to about 14 nanoseconds. Unfortunately, atmospheric effects introduce short-term jitter into a GPS signal of between 50 and 300 nanoseconds. An expensive GPS receiver can eliminate that jitter by mixing the GPS signal with a local, very accurate clock (double oven crystal or atomic). Such a receiver needs to be turned on for many hours before it can deliver anything close to the 14 ns potential in the GPS time signal.
To get high accuracy you must connect a synchronization signal between the GPS receiver and the FPGA board. With most receivers this is the receiver’s 1 PPS output. You need a separate connection to receive the rough time-of-day. This other connection is called a “time code”. The most common time code is a proprietary ASCII protocol flowing over an old-fashioned serial port. However, some receivers also provide standard time code over a BNC connector with an AM or DC Level Shift (DCLS) transport of protocols IRIG A, B, G or NASA36. Current BittWare cards do not offer the hardware needed to directly receive any time code. Users must flow the time code through the board’s host computer.
In an IEEE-1588 installation, only the grandmaster is typically directly connected to the GPS receiver. It is very rare to distribute that receiver’s 1 PPS signal to all PTP consumers and even rarer to distribute the GPS 10 MHz clock to PTP consumers. Thus, it isn’t cost-effective to spend lots of money on a double oven stabilized GPS receiver in this style installation. Who cares if the grandmaster keeps time to within 14 ns when the consumers of PTP are likely synchronizing to within only a few microseconds?
There are applications where users seek to synchronize timestamps among multiple FPGA boards co-located in a small area, like an equipment rack. In this case it is practical to distribute the same 1 PPS correction signal and 10 MHz time reference to every board. How close will the timestamps be?
Let us assume we are using the Atomic Rules time servo that is nominally clocked at 400 MHz. That means each clock tick is roughly 2.5 ns long. However, at 100 GbE, the time servo feeds its output asynchronously into the Xilinx CMAC which runs at roughly 322 MHz (3.1 ns). How those asynchronous signals mix isn’t well documented. However, we have run experiments and discovered that timestamps values in this configuration tend to fall within 3-4 ns between cards.
BittWare’s support for a 10 MHz clock input is unusual. Dedicated packet capture cards generally do not have any 10 MHz reference input. Instead they contain a higher-quality crystal oscillator. However, when multi-board synchronization is needed, using a common reference clock helps. So, the BittWare approach is a little less accurate (lesser crystal) but can become much more accurate (use a lab quality reference clock) and will synchronize time between multiple cards much more accurately.
BittWare’s SmartNIC 100 GbE reference design includes support for timestamps. It uses the Atomic Rules time servo. It uses DPDK’s PTP Client Sample Application to process the PTP protocol and control the time servo. https://doc.dpdk.org/guides/sample_app_ug/ptpclient.html
There is a key filter in the SmartNIC pipeline that identifies PTP packets. Currently, that filter does not contain a parser and thus only can identify “Annex F, Transport of PTP over IEEE 802.3 /Ethernet” packets. BittWare has since created a parser that could be deployed to support PTP over UDP and TCP.
For timestamping users, possibly the most useful block in the SmartNIC reference design is BittWare’s CMAC LBus to AXI4-Streaming gasket. Our version handles the required timestamp math.
With the existing features on many BittWare FPGA cards, you have everything you need to maintain timestamps to whatever accuracy your application requires—from “good enough” all the way up to “time laboratory reference.” Our partnership with Atomic Rules provides all the timestamp plumbing your application requires, allowing card users to focus on their own company’s unique added value.