Covering Scientific & Technical AI | Monday, October 7, 2024

VP of Engineering at Colder Products Company Discusses Liquid Cooling for AI 

In this insightful interview, I spoke with Rick Kirchner, Vice President of Engineering at Colder Products Company (CPC), to explore the critical role of liquid cooling systems in modern computing architectures, particularly for AI applications.

As part of our Executive Debrief series, we delve into the necessity of liquid cooling for high-performance computing and AI, driven by the increasing power density of GPUs. Kirchner elaborates on CPC's valve technology, including their Everest Quick Disconnect couplings, which are designed to protect expensive AI equipment from leaks during maintenance operations.

This discussion sheds light on how CPC is positioned at the forefront of cooling technology for AI and high-performance computing, offering valuable insights into the challenges and innovations in this rapidly evolving field.

Kevin Jackson: Hi Rick, thanks for sitting down with me today and thanks to everyone who's joining in on the I debrief. Today we're talking with Rick Kirshner, vice president of engineering at Colder Products Company. Rick, I'm happy to have you here. I'm excited to hear more about your company's work with liquid cooling systems.

Rick Kirchner: Oh, thanks for having me. This is super exciting for us as well.

Kevin Jackson: All right. So just starting a little basic here. What unique advantages and challenges do liquid cooling systems have for computing architectures compared to other cooling methods?

Rick Kirchner: Well, in the case of AI, among other similar high performance computing applications, it's really not even an advantage. It's turned into a necessity because the power density has gotten so huge with the GPUs that are coming out these days, that really managing all of that thermal energy and getting [00:01:00] that excess of excessive heat out of the chip is required via liquid as opposed to air. So again, it's more than it's beyond advantage. Now it's a necessity. And as these chips get more and more complicated, they're going to get more and more dense. And the energy density is going to become much, much higher as well. So you can see where as computers have changed so much and as as the digital world has changed so much over the last five years, specifically over the last 18 months with AI, those temperatures have really, really gone up even more. So you can see, like even from companies like Nvidia, that they're mandating or requiring their customers to use liquid cooling in order to really make the most out of their GPUs.

Kevin Jackson: And can you expand on your company's valve technology and how it protects expensive AI equipment from leaks? And maybe more specifically, can you explain how your company's Everest Quick Disconnect couplings are designed for liquid cooling applications?

Rick Kirchner: For sure. These products were originally designed with the with the end in mind. Quite a while ago, we introduced liquid cooling couplers for the liquid cooling of of power rectifiers for the communications market almost 20 years ago. So this technology, this valve technology that we have today, while it's been, of course, expanded upon and even perfected even beyond what we had 20 years ago, was was originally built for this purpose. The reason why it's critical, I would say, important to our customers, is that when you're able to disconnect a blade from a server rack to do what we call a hot swap, right. If you need to make a change you're able to disconnect the fluid couplings, the in and the out for each blade, and have dual valve capabilities on both connectors so that you don’t drip or leak any, any liquid once you pull that out. So that dual valve technology is critical to the success of the product. I would also say that this is built upon technology that CPC developed quite some time ago, even before that period of time. Our original products have had dual valve interfaces going back to the beginning. So this is something that's really core to our capability and expertise within CPC.

Kevin Jackson: Gotcha.  What role do your products play in Nvidia's BG 272 and Nvl 36? Can you maybe expand on this partnership, how it came about, and what are your hopes for the future?

Rick Kirchner: Sure. So, we've had a relationship with Nvidia specifically for for quite some time, I may have mentioned as well as as other companies, but Nvidia right now is the centerpiece of the world when it comes to AI, right? So, we we've engaged with them on an application engineering level as well as just a straight up technology development level. They've been to our facility multiple times, not only for business reviews, [00:04:00] but really just tactical interfacing and exchanging of of ideas and concepts. We also work very closely with their their infrastructure of supply chain. Right. So they have Nvidia has a very rich supply chain of folks that put together subsystems for them, such as Supermicro that ultimately will end up becoming products that are run by companies like Amazon. That relationship with Nvidia extends to all those other partners as well. So we have an ongoing conversation going with Supermicro. For example, Amazon and Nvidia, all for the greater good of all of these systems that are coming together. So we have a very clear picture of what's required. And even further than that, we pride ourselves on not only knowing the coupler world, but also really more specifically the cooling world within within this industry. So we understand the fluid flow dynamics, the cooling capacity of any given system. What pumps are being specified? What tubing materials are being used? We worked hand in glove with the design team on the other side. So it's not just couplers, but at the end of the day, that's what we sell our customers. We provide a service much more than that.

Kevin Jackson: Can you elaborate on the collaboration between your company and HPE? What is exciting you about that work?

Rick Kirchner: Oh, that's a great story. It's a fun story from our history that goes back to about 2015, 2016, when our team started to work with Cray over in Chippewa Falls, Wisconsin. This is before HPE actually acquired Cray. Our Everest line was a perfect fit for what they needed at the time. We because there were only a drive an hour and a half drive away, we were able to work closely with them, be invited into their lab, and help really develop the rest of the system, even beyond our couplers. As I was mentioning earlier, we it was a really good partnership. They taught us a lot. I think we taught them a lot. And that relationship continues on today. And I like to say [00:06:00] that that first computer we worked on ultimately became the Aurora installation down at Argonne National Lab.

Kevin Jackson: Now, I didn't know that. That's very interesting. I actually got a chance to check out Aurora in person. It's an impressive machine.

Rick Kirchner: Very cool.

Kevin Jackson: Right on. So I also saw that your engineers provide technical education on topics like ambient conditions and material compatibility considerations. Why is this education important?

Rick Kirchner: Well, it really helps. I think it really helps our customers. It helps the industry. This is an engineering corporation. This is what we do. And helping our customers succeed, helping the industry succeed is in everybody's best interest. When we all win, we all win. And so when we learn things that that we pick up from elsewhere within the industry, of course problem solving and helping customers with issues whether it's corrosion control or flow rates or what have you. Material compatibility is another big one. We want to share that with the industry in a way that helps everyone, not hurts everyone, right? So we want to be able to lift everybody up again to, to help us all succeed. So it's very critical to help people from falling into pitfalls around again, a corrosion problem perhaps, or a material compatibility problem.

Kevin Jackson: More generally, what are you excited to see in the future of liquid cooling for AI architectures? Like what are you looking forward to?

Rick Kirchner: Well, the next generation of cooling is is going to be very interesting. So I think, as I said earlier, air cooled systems are just just not not good enough, right, for this current world. And I think eventually what we're calling single phase or just straight up liquid transfer may not also be good enough, right? So as the power density goes up even more, I think this is where you're going to start to see some more refrigerant style systems where they're two phase, right. So the liquid is going from liquid I'm sorry. The coolant is going from liquid to vapor and back again just like an air conditioner right up against the chip to try and keep it cool. Some companies may even want to refrigerate below to much lower temperatures to increase throughput. Right. That's currently happening right now in the labs. But in commonplace that might actually become more of the standard. So the couplers, the tubing, the pipes, the the the cooling loop itself is going to need to be able to be adapted to two phase systems that, that that can withstand that kind of, of pressure and temperature. It's a very different world. And we're already beginning to do a lot of R&D around this. And I think that this is this is really going to be the future of, of of cooling for, for high performance computing.

Kevin Jackson: Fantastic. Well, this was a great talk. Thanks so much for speaking with me today, Rick. I learned a lot and I hope our audience did too.

Rick Kirchner: All right. Thank you for the opportunity.

AIwire