The first presentation at the ITSF 2021 conference in Brighton discussed the implementation of Precision Timing Protocol (PTP) in datacenters and this was the first of many presentations that focused on the benefits and challenges of improved synchronization in this market.
Improve Sync in datacenters and the benefits are shared
Throughout the conference, it was pointed out that better synchronization in datacenters promises to improve the end user experience in sectors such as gaming and streaming. In the highly regulated Finance section, MIFID II regulations state that clocks be synchronized to within 100us of UTC. Therefore, a highly accurate synchronization protocol is vital for compliance.
For broadcasters highly accurate synchronization enables the streamlining of the production process, eliminating the need to bring all equipment and resources to an event. Media feeds from an event can be sent from the venue and then combined with production effects and commentator video/audio created in another country. If time is known accurately at each location, there will be no perceivable artefacts created in the end broadcast.
Mobile network operators are already familiar with the need for highly accurate timing and synchronization. As these operators disaggregate and virtualise functionality, as when implementing ORAN in 5G NR, the need for µs timing accuracy moves into the datacenter.
Better synchronization in datacenters improves efficiencies.
The relatively high timing uncertainty that exists in networks employing legacy protocols like NTP can create reductions in throughput. Servers and database software have become so powerful that dealing with the uncertainty around milliseconds of timing error, which is common when NTP is used, represents a huge overhead. It has actually been reported that, in trials, if the uncertainty in timing in datacenters is removed, throughput can be improved by up to 100 times.
We know that our appetite for more and more data shows no sign of abating and this drives the building of more datacentre infrastructure to support the demand. Datacenter operators are, therefore, spending billions of dollars every year just to keep up. Anything they can do to improve the efficient use of existing infrastructure will benefit them and anything they can do to reduce the energy needed to drive their infrastructure will benefit us all.
Another benefit of better timing accuracy is the simplification of operations. Accurate timing can simplify the operation of distributed data bases and allow engineers to start treating the datacenter like one big computer.
So why don't all datacenters roll out PTP across all their infrastructure?
The challenge is that until you have the infrastructure it can be challenging to prove that the benefits outweigh the cost but it can be a challenge to prove the benefits until the infrastructure is in place. Part of the problem is that widespread deployment of PTP traditionally requires the use of large quantities of expensive off-the-shelf time appliances, but as Meta recently pointed out in their engineering forum, off-the-shelf time appliances come with drawbacks.
While they are well proven and generally very well performing and stable devices, their drawbacks include:
- Technology is often older and vulnerable to software security concerns.
- Devices come with closed source software, making configuration and monitoring problematic.
- Proprietary hardware is included that is not user-serviceable.
- They can become very costly to purchase and operate.
Making the business case with Open Compute Project (OCP) and Time Appliance Project (TAP)
Several of the ITSF presentations dealt specifically with Meta and the work they have been doing in datacentre timing and synchronization as part of the Open Compute Project (OCP) and Time Appliance Project (TAP).
The OCP TAP is making the business case for widespread PTP deployment just a bit easier. Through this project, Meta have developed a versatile and economical time engine enabling µs accurate PTP synchronisation any server with hardware timestamping. The time card, as it is called, can support hundreds of thousands of servers enabling deployment at the scale present in modern datacenters. Meta has made the time card HW and SW open source and available to all. The time card removes much of the cost and other drawbacks of reference clocks currently on the market.
Calnex is pleased to also contribute to the OCP TAP via the Instrumentation and Measurement group which aims to provide a path to Open Sourced implementations of Test & Measurement equipment and systems.
While the implementation of better timing can vary between datacenter operators and may involve boundary clocks or transparent clocks, PTP, distribution of 1pps or even white rabbit, the need for and benefits of better timing accuracy is well established. While the type of timing appliance traditionally used in other networks may not be ideal at the scale of hyperscalers, new technologies such as the OCP TAP timing card along with good network design and proper monitoring will allow operators to economically implement and reap the benefits of the wide-spread deployment of better synchronization.
Background literature: Timing and Synchronization Library