share article

Share on facebook
Share on twitter
Share on linkedin

Thermal simulation of IT systems


By Matt Evans, Future Facilities

To design and simulate any piece of electronic equipment, first you must understand the environment in which it will operate. For IT equipment, particularly servers, this is inside a datacentre, which people assume are carefully-controlled, homogenous environments. However, anyone who has set foot in a datacentre knows this is not strictly the case. While every effort is made to control the temperature of air supplied to equipment, and in some cases pressure difference between aisles, there are a number of factors that can affect conditions at both room- and cabinet-level.

Room Level Variations

A typical datacentre is laid out using a cold-/hot-aisle configuration. In cold aisle, cold air is supplied to the IT equipment through grilles in a raised floor; in hot aisle, air is exhausted from the equipment to be returned to air conditioning units (ACUs). Older, less-efficient facilities don’t use containment, leaving nothing in place to segregate hot air from the cold, both recirculating and mixing.

Figure 1: Temperature distribution across an uncontained datacentre. Recirculation can be seen at the ends of each aisle (modelled in 6Sigma DCX)

Newer, more-efficient facilities, however, use containment, with panelling that segregates either the cold or hot aisle from the rest of the facility, making the temperature in each aisle more uniform. A side effect of this approach is that, typically, there is a higher pressure in the aisles that is contained. Although this is usually small, somewhere in the region of 2-10Pa, it can still impact the flowrate through cooling fans.

Figure 2: Temperature distribution across a hot aisle contained datacentre, where there’s no mixing of air between aisles (modelled in 6Sigma DCX)

Cabinet-Level Variations

In a datacentre, IT equipment sits in a cabinet, each with a mounting rail with numerous standard-sized U-slots. It is rare that a cabinet is full to capacity and it is good practice to fill the empty slots with blanking plates. Missing blanking plates, or gaps in the blanking, can allow hot air to recirculate back though the cabinet. In this situation, some servers in the cabinet will be receiving cold air at their inlet and others hot. Even if empty U-slots are blanked, increased pressure in the hot aisle can force hot air back into the cold aisle through IT equipment that is switched off or has a low flowrate.

Figure 3: Streamlines showing cabinet-level recirculation through unbanked U-slots (modelled in 6Sigma DCX)

So, how cold is cold? Not all datacentres supply air at the same temperature. Legacy facilities often supply it at around 15-18oC, and more efficient facilities supply it at higher temperatures, up to 27oC. The design should offer sufficient cooling for hotter environments, including worst-case scenarios.

A Typical Server

Although there are different type servers, each designed for a different purpose, they all follow the same basic formula. They typically consist of a rectangular chassis with dimensions designed to fit in the mounting rail of a cabinet. Inside the chassis is tightly packed with PCBs, most of which with high-power components, and power supply units. High power density can lead to high temperatures which could cause sensitive components to fail. However, reliability is a must, especially in mission-critical facilities where server failure can have severe consequences.

To manage temperature, each end of the chassis has vents to allow air flow, driven by fans. Fan speed is controlled to keep sensitive components at a safe operating temperature for a given server load, whilst also striving to make the server as energy efficient as possible.

Figure 4: Storage server model built in 6SigmaET

Five Things to Consider When Modelling IT equipment

Given the subtle ways that small temperature changes can impact datacentres and their associated IT equipment, it’s important that developers understand the thermal implications of their designs. With this in mind, here are five key things to consider when thermally modelling IT equipment:

1. Request Component Models

A component can be represented differently in a model. At the simplest level, they could be a single block of material that dissipates a specified amount of power; at the most-detailed level, individual subcomponents, leads or solder balls each could be modelled with their own material properties and power dissipations. The construction of a component determines how effectively it transfers heat to the board and surrounding air, which ultimately determines the temperature it reaches.

Creating accurate models of components – especially high-powered ones such as CPUs – is essential to electronics thermal simulation but is not without its challenges. Datasheets typically provide a maximum power value and little detail on the construction of the component; however, don’t be put off by them. Component manufacturers often have readily-available models of their components, supplied on request. Files are typically provided in formats native to individual simulation tools, but increasingly, vendors such as Future Facilities are supporting the idea of neutral file formats, such as ECXML. This allows component models to be imported into a design, regardless of the simulation suite and the original it was created in. In the case of 6SigmaET, ECXML components can be added directly to your simulation from the import ribbon.

Figure 5: Detailed component models can be imported into 6SigmaET from ECXML files

2. Fan curves

At any given fan speed, the flowrate depends on the pressure difference across the fan, which varies from datacentre to datacentre due to factors such as aisle containment, for example. High pressure on the outlet side will result in relatively low flowrate, and high pressure on the inlet side will give higher rate. Manufacturers typically provide a pressure volume curve that defines a fan’s flowrate for a given pressure at maximum fan speed. In 6SigmaET this curve can be added to a fan modelling object. When the fan is running at a speed slower than the maximum, the curve will be intelligently de-rated.

Figure 6: Curve applied to a fan in 6SigmaET showing the operating point determined by the solver

3. Control

Inside a server, fan speed is typically controlled to maintain a set temperature at specific points. For maximum accuracy this control needs to be built into the simulation model. In 6SigmaET, fan speed can be controlled using a proportional integral (PI) controller taking input from one or more sensors that can be located anywhere in the model. More complex control can be modelled via an exchange system between the CFD solver and an external .dll (dynamic link library) which defines the control system. Similarly, component power can also be controlled to represent throttling.

Figure 7: In 6SigmaET, fan speed can be controlled in the same way it would be in a real server

4. Boundary Conditions

Once you have built a model, boundary conditions need to be chosen carefully to best represent a real-world scenario. It’s pointless simulating a server sitting in open space only to find your prototype fails when placed inside a cabinet. In 6SigmaET you can replicate cabinet conditions by placing the server model in a test chamber. The chamber’s inlet and outlet sides should be set to ‘open’, with a temperature and pressure assigned to represent both hot and cold aisle conditions. The remaining sides should be set to ‘symmetry’ so that there’s no flow or heat transfer – in less abstract terms, this represents similar servers surrounding the one you are modelling.

Figure 8: Different boundary conditions can be applied to each side of the server to mimic inside the cabinet

5. Transient Analysis

Transient analysis can be used to model the response of a system over time, to fixed or time-varying boundary conditions. For IT equipment, it’s useful to understand how a cooling failure in the datacentre could impact component- and power supply temperatures. In this scenario, the room will steadily heat up before reaching a new steady temperature. Once the failed ACU is back online, temperatures will reduce. Does the control system increase fan speed quickly enough to respond to the increased inlet temperature, or is hardware at risk of overheating?

You can set up this scenario in 6SigmaET by specifying a temperature vs time curve on the inlet side of the model. The transient analysis should begin from a converged steady-state solution that represents conditions at t = 0s.

Figure 9: Transient inlet conditions and corresponding response curve for an object (modelled in 6SigmaET)

Control is Key

The datacentre environment is more complex than it first seems. A datacentre is a living, breathing thing and each has its own personality. Differences in supply air temperature, design and implementation of containment and blanking determine the temperature of inlet air that equipment receives. However, once you understand phenomena that can cause recirculation in a room, it is possible to design worst-case-scenario conditions to test the robustness of your server.

Using thermal simulation, it is possible to simulate IT equipment under any number of conditions or server configurations. Once you have built a baseline model, it is as simple as changing boundary conditions or reconfiguring objects and clicking ‘solve’.

To produce a CFD model that is fully representative of a real server, however, it is essential to choose a simulation tool that allows you to model control algorithms; this is often not possible using legacy tools. Without this functionality, component powers and fan speeds won’t adapt to different conditions and will need to be adjusted manually based on experiment, experience or assumption. Without modelling control, you will never be able to fully understand the behaviour of a server using simulation.

Share this article

Share on facebook
Share on twitter
Share on linkedin

Member Login