NelworksNelworks
Engineering

Data Center Cooling

A finite-difference heat diffusion simulation of a server room. Racks generate heat proportional to load; CRAC units absorb it. Watch thermal gradients form, throttling kick in, and PUE change as you rearrange the room.

Every server produces heat. In a data center, that heat is the dominant operating cost — and the dominant design constraint. Where you place racks, how many coolers you run, and how hard you push the load all interact through the same physics: heat diffuses through air, accumulates where cooling can't reach, and eventually forces servers to throttle.

This simulation runs that physics in your browser.

The model is a naive finite-difference approximation — uniform air, no airflow direction, no rack-to-rack radiation. It captures the qualitative physics of conduction and heat accumulation, not the quantitative reality of a CFD model. Think of it as a teaching tool, not a planning tool.


Simulation

Click any cell to add or remove a rack. Green LED = normal, amber = high load, red = throttling. Top and bottom rows are CRAC cooler units and cannot be moved. The temperature colour shifts from teal (cool) through amber to red as heat builds.

Max temp20.0 °C
Avg rack temp20.0 °C
Throttled0.0 %
PUE1.00

What the simulation models

Heat equation

The temperature at each grid cell evolves according to the discrete heat equation:

dT/dt = α · ∇²T + source − sink

where:

  • α = 0.15 is the thermal diffusivity — controls how fast heat spreads through air
  • ∇²T is the discrete Laplacian: sum of four neighbour temperatures minus four times the centre cell
  • source: rack cells add heat proportional to server load each step
  • sink: CRAC cooler cells remove heat proportional to how far above setpoint they are

Each time step uses an explicit forward-Euler update with step size Δt = 0.2. The Courant–Friedrichs–Lewy (CFL) stability condition requires α · 4 · Δt ≤ 0.5; at these values it is 0.12, well within the stable region.

Throttling

When a rack cell exceeds 65 °C, it begins throttling — reducing its effective heat output (and by analogy, its compute throughput). Full throttle kicks in at 90 °C. This is the simulation's representation of Intel/AMD thermal throttling: the CPU slows its clock to stay within junction temperature limits.

PUE

Power Usage Effectiveness is the standard data center efficiency metric:

PUE = Total facility power / IT equipment power

A PUE of 1.0 is impossible (it would mean zero cooling overhead). Best-in-class hyperscaler facilities reach ~1.1. Typical enterprise data centers run 1.5–2.0. The simulation approximates PUE from the ratio of cooling power drawn by the CRAC units to the compute power drawn by the racks.


What to look for

ObservationWhat it means
Middle racks hotter than edge racksHeat diffuses slowly through air; distance from cooler determines steady-state temperature
Throttling appears before you expect itRacks in thermal "dead zones" — surrounded by other racks with no clear path to a cooler — accumulate heat faster than the diffusion can clear it
PUE rises as load increasesMore rack heat forces coolers to work harder; cooling overhead grows faster than IT power at high load
Removing middle racks cools neighboursEliminating a heat source clears a diffusion path for adjacent racks; temperature drops even at constant total load
PUE drops when you remove racksFewer racks means less total heat; same cooler capacity; efficiency improves

The inventory analogy

A data center is structurally similar to a manufacturing floor with limited storage:

Data center conceptInventory analogy
Server rackWorkstation processing jobs
Heat generatedWork in progress piling up at the station
CRAC coolerDownstream buffer that clears the backlog
Thermal throttlingStation slows down when pile gets too high
Air temperatureQueue depth between station and buffer
PUERatio of buffer capacity consumed to work actually done

The core constraint is the same: throughput is limited not by the workstations, but by how fast the buffer can drain. Adding more workstations without expanding the buffer raises queue depth until the workstations self-limit. The efficient design keeps queue depth low by placing workstations close to buffers — and by not over-filling the floor.


What a real model adds

FeatureThis simulationProduction CFD
Airflow directionNone — isotropic diffusion onlyRaised-floor plenum, hot-aisle/cold-aisle containment, directional airflow velocity field
Rack-level detailOne cell per rack1U server granularity, inlet/exhaust temperature differential
HumidityNot modelledDew point, condensation risk on cooling coils
Cooling system dynamicsInstantaneous proportional responseChiller compressor lag, coolant flow rate, economiser switching
Failure modesNoneCRAC unit trip, partial blockage, hot aisle recirculation
Electrical modelNonePower draw per rack, PDU loading, UPS efficiency

The qualitative insight — that rack placement relative to coolers determines temperature, and that PUE is a function of how hard you push the cooling system — survives into real deployments. The numbers do not.