\
\
The Hot News Heard Around the Chillers: Optimizing Your AI Cooling Investments

The Hot News Heard Around the Chillers: Optimizing Your AI Cooling Investments

As we continue to see, AI workloads continue to scale to unprecedented levels. Underpinning this though is the need to cool AI infrastructure. Performance must be maximized to feed this insatiable growth and cooling is the key. Market experts vary on the exact costs, with ranges varying from $10 billion to $80 billion per year, but the end is same, there is a huge amount of investment needed to keep AI workloads humming. Our view…there is a ready opportunity to optimize your AI cooling investments and also maximize performance. By embracing targeted, precision cooling, a new level of energy efficiency, performance and TCO is readily achievable

The Hidden Costs of Traditional Data Center Cooling

There are two key drivers behind the industry’s push for innovation in this space. The first is the upfront CapEx required to build massive facility-level cooling plants. The second is the day-to-day energy consumption required for today’s and tomorrow’s cooling infrastructure. We also can’t forget the lost opportunity cost of redirecting that energy to the GPUs, networking, memory, and storage that drive AI workloads and results. There is a clear opportunity to optimize your cooling and maximize your performance. With an optimized approach, you can reach near 90% ROI, a clear step function in gains.

Breaking the Thermal Wall: The Move to 1MW Racks

Whether it’s announcements such as Project Deschutes from OCP, Jensen’s CES announcement  or  other novel liquid cooling approaches, the data center cooling industry does continue to advance. Many of these are still traditional bulk cooling, which has its place as proven and dependable.

There is an alternate approach. Why overprovision for a “worst-case” scenario when you can dial in your cooling efficiency? There is no room for wasted investments and no time for wasted energy.  The physics of today’s high-density racks creates a unique challenge.  In such an environment,  many of today’s cooling approaches struggle – whether it’s a form factor constraint such as in Optical transceivers or a bulk cooling approach for GPUs. Concentrated heat loads can exceed the tolerances needed for performance and function To overcome this, every ounce of energy needs to give a clear payback. That payback is best provided when delivered via precision –  where you need cooling but even more so – when you need it. Precision = pinpointing the where combined with predictive and punctual actions.  Only then can you unleash every watt and unlock every degree.  

This is the Phononic approach. Solid state, high quality and energy efficient technology can deliver this level of precision. In fact, you can reach nearly 90% ROI on right sizing your infrastructure combined with improved energy efficiency through this approach. When factoring in performance gains and optimizations, this approach furthers ROI of near 3X – an unbeatable combination.

The "Hot Water" Revolution: How Solid State Cooling Makes 45°C Loops Possible

At CES earlier this year, Jensen’s mention of using 45-degree liquid temperatures and stating “we’re cooling with hot water” was a key development for the industry. It is also  consistent with a path that Phononic sees for cooling AI infrastructure. But there is a specific engineering challenge: If we let the water get hot (45°C), how do we stop the most critical aspects of a GPU from overheating and throttling the entire system? This is where solid state cooling comes in. A TEC cooler (Thermoelectric Cooler) acts as a solid state heat pump. It actively moves heat away, allowing GPUs to operate at optimal temperatures even when the facility water loop is much hotter. When using this targeted approach to manage GPU hotspot temperatures, system designers can raise liquid temps safely. Our analysis shows that a 45-degree liquid temp, a 10-degree increase over current standards, is a useful estimate for identifying current infrastructure opportunities.

Improving Data Center Efficiency and ROI

Today’s liquid cooling data centers already boast much better Power Usage Effectiveness (PUE) than traditional air cooling. However, “better” is no longer good enough for the scale of AI. By seamlessly integrating a TEC cooler at the node level, we decouple the component temperature from the facility water temperature. With the 10-degree increase in liquid temp PUEs can improve by ~.05 to ~.18. That is a massive leap for efficiency and a straight shot to better Data Center ROI. Data center operators can now  cut their overall cooling costs.  With Phononic’s world class devices that underpin this opportunity providing cooling needed not only where, but also when, thereby enabling this opportunity for continued cost savings while maintaining performance.

Related Content

Thermal Kit_square
Thermal Kit: Cooling to Unthrottle AI Performance
CERN_square
Delivering Advanced Cooling Solutions for Pioneers like CERN Pus...
Responsive CDU_square 1
Responsive CDUs for AI Factory Energy Efficiency

Take Your Compute Performance to the Next Level with Phononic.

*Denotes required field