CourseNana | Assignment: Heat diffusion

v1.0 — October 6th 2022 CourseNana.COM

As you are aware, microprocessors are prone to heating up. In 2023, the most powerful pro- cessors absorb more than 200 Watts and dissipate it as heat. Evacuating this heat is a critical issue: if not managed well, the rise in temperature will destroy the processor. Put simply, if the processor exceeds a certain temperature Tmax (for example 100°C), it will be destroyed. CourseNana.COM

The heat flux density (i.e. the number of Watts transferred per square meter) of processors is ex- tremely high. A modern high-end CPU can produce 280 W on 10 cm2, which makes 280 kW/m2. This is ten times more than a hotplate, and it’s about half as much as a fuel rod in a nuclear reactor. CourseNana.COM

For this reason, dissipators in the form of metal heatsinks are fitted against the processors. They allow for the efficient transfer of heat from the processor to an external environment. In mainstream machines, a fan is used to evacuate the heat of these metal dissipators by forced convection1. In computing centers, the heatsink is often cooled by a fluid (water-cooling). Finally, in the case of less powerful processors, the phenomenon of natural convection2 may be sufficient to remove heat from the heatsink. CourseNana.COM

We assume here that the heatsink is a rectangular parallelepiped (a cuboid). Its surface, its thickness and the choice of material it is made of all affect its thermal characteristics. CourseNana.COM

This project consists in parallelizing a numerical simulation that determines the temperature reached by a recent CPU (an AMD EPYC “rome”) on which we place an aluminum plate of 15cm × 12cm × 0.8cm. The z = 0 face of the heatsink is forcibly held at T = 20◦C by some cooling device, and we assume that the entire heatsink is initially at this temperature. The ambient air is also assumed to be at T = 20◦C. The general idea is to turn on the processor at time t = 0 and simulate the diffusion of heat in the heatsink until an approximately steady state is reached. CourseNana.COM

Will the processor overheat? What if we reduce/increase the thickness/area of the What if we reduce/increase the thickness/area of the heatsink? What if we replace the aluminium with copper,ironor... gold? CourseNana.COM

1To increase the contact surface with the air flow and maximize heat transfer, the heat sinks are usually equipped with with fins. CourseNana.COM

2The air in contact with the heatsink absorbs heat, heats up, rises and makes room for fresh air. CourseNana.COM

1 CourseNana.COM

Figure 1: A copper heat sink (with fins) placed on a board containing a CPU. CourseNana.COM

Figure 2: The CPU under the simulated heatsink. Real size. Power dissipation: 280W. Only the black parts are in contact with the heatsink. CourseNana.COM

2 Physical principle of the numerical simulation 2.1 General comments CourseNana.COM

Heat is not to be confused with temperature. Heat is a transfer of thermal energy. When two bodies are brought into contact, an exchange of thermal energy takes place until both bodies are at the same temperature. Temperature is measured in Celsius degrees (◦C) or Kelvin degrees (K, where 0 K = −273.1 ◦C), while quantities of energy (thermal or otherwise) are measured in joules (J). CourseNana.COM

When heat energy is exchanged, it is convenient to measure the flow rate. Heat flow is the amount of energy transferred per unit time, and is expressed in Watts (W) — it is the equivalent of some power. For example, if a constant heat flow Φ is transferred for t seconds, the amount of thermal energy transferred is Q = Φt. In the case of our CPU, the heat flow from the processor to the heatsink is Φ = 280 W. CourseNana.COM

Page 2 CourseNana.COM

Figure 3: Result of the numerical simulation in stationary regime. CourseNana.COM

y CourseNana.COM

z CourseNana.COM

x CourseNana.COM

Figure 4: Schematic illustration of the heat sink. CourseNana.COM

Page 3 CourseNana.COM

When heat is transferred occurs across a surface, we can also examine how the heat flow rate varies along that surface. The heat flux density φ indicates what heat flux passes through a given point of the surface. If it is constant over a surface S, then the total heat flux is Φ = φS. Still in the case of our processor, its contact surface with the heatsink is about 10cm2, which gives φ = 280 kW m−2. CourseNana.COM

When a material receives thermal energy, its temperature rises. Its mass thermal capacity (or specific heat capacity), generally denoted by c, represents the amount of energy (measured in J) required to raise the temperature of 1kg of the material by 1 degree. For example, we have c = 897 J kg−1 K−1 for aluminium, so if we put a 1Kg block of aluminum on our processor, then we know that it will heat up by ≈ 0.31 ◦C per second (that is 280/897). Without external cooling, it will go from 20 ◦C to 100 ◦C in 256.3 seconds. CourseNana.COM

2.2 Thermal transfers CourseNana.COM

It is assumed that there are four different types of heat transfer that take place. In each case, the heat flux (the flow rate of the transfer) is directly proportional to the surface area where the transfer occurs. Therefore we only provide the heat flux density. CourseNana.COM

1. Thermal conduction from the CPU to the heatsink. It occurs solely at points of contact, which all reside on the y = 0 plane. We have already calculated that φ = 2.8×105 W m−2. CourseNana.COM

2. Heat conduction within the heatsink: heat propagates in the metal from the hot parts to −−→ CourseNana.COM

the cold parts. This transfer obeys Fourier’s law: φ⃗ = −λ · grad T , where λ (the thermal conductivity) characterizes the capacity of the medium to conduct heat: the greater λ is, the faster the thermal transfer. Conductivity is expressed in Watts per meter per Kelvin and represents the heat flux that circulates between two points when the temperature gradient is 1 degree per meter. CourseNana.COM

More concretely, if we have a wall of thickness e whose two sides are of homogeneous temperature T1 and T2, then the heat flux density at any point of the wall (from 1 to 2) is φ = λ(T1 − T2)/e. CourseNana.COM

Convection on the edges of the heatsink: the metal heats the ambient air which enters in movement and evacuates the heat. If a solid wall of surface S and temperature T1 is in con- tact with a fluid (air) of temperature T2, the heat flux density is φ1→2 = h(T1 −T2), where h is the convection heat transfer coefficient of the material. We observe experimentally that h ≈ 10Wm−2 K−1 for air. CourseNana.COM
Radiation emitted on the edges of the heatsink: the metal releases an electromagnetic wave whose frequency depends on the temperature (it can belong to the visible light spectrum: “hot-white” metal). Under reasonable assumptions (black body radiation), the density of thermal flux emitted towards the outside is φ = σT4, where σ is the Stefan-Boltzmann constant (its value is 5.6703×10−8 W m−2 K−4; be careful that here the temperature must be in Kelvin). CourseNana.COM

The proposed numerical simulation claims a certain realism, but it is not perfect: the temper- ature of the medium is assumed to be fixed, the black body hypothesis is a simplification, and it is assumed that the heat sink does not receive thermal energy from the surrounding medium by radiation, which is impossible. CourseNana.COM

Page 4 CourseNana.COM

2.3 Numerical simulation CourseNana.COM

To simulate all these thermal transfers, we discretize time and space. We slice the heatsink in cells (small cuboids) of size ∆x × ∆y × ∆z, and we suppose that during a sufficiently short time step, the temperature is homogeneous within each cell. The time steps are small enough so that the propagation of thermal energy is relatively slow, and in particular so that it cannot “cross” several cells at the same time (thus the more the metal is a good thermal conductor, the smaller time steps have to be). CourseNana.COM

We denote by T (i, j, k) the temperature of the cell of coordinates (i, j, k) at time t. The problem consists in computing the temperature of all the cells at time t + ∆t. CourseNana.COM

To do this, we must determine the heat flux Φ received by each cell, and then we can calculate the variation of its temperature thanks to the mass heat capacity of the metal and the mass of the cells — fortunately we know the density ρ of the metal and the volume of the cells. We will thus have ∆T = ∆Q/(c∆x∆y∆zρ). CourseNana.COM

The cells which are on the surface yield energy to the external medium by radiation. For example, on the face z = 1, a cell of temperature T transfers to the outside a heat flux of ∆x∆yσT4 Watts (indeed, the exchange surface is ∆x∆y). CourseNana.COM

The cells that are on the surface also give up energy to the outside environment by convection. For example, on the face x = 0, a cell of temperature T transfers ∆y∆zh(T − 293.1) Watts. CourseNana.COM

Each of the cells in contact with the CPU receives a thermal flux of ∆x∆y2.8 × 105.
Finally, a cell exchanges thermal energy with each of its neighbors. Cell a receives from cell b CourseNana.COM

(its right-hand neighbor on the x axis, say) a heat flux λ∆y∆z(Tb − Ta)/∆x. 2.4 Implementation CourseNana.COM

To simplify matters, we assume that cells are cubic (∆x = ∆y = ∆z). In the program, we denote by dl the length of the cell sides. CourseNana.COM

Representation in memory One (minor) difficulty is that we have to represent the three- dimensional array T in memory. Let us assume that it is of dimension n × m × o (according to the axes x, y, z). We flatten it on one dimension using the following convention (row-major, usual in the C language): the xy planes are represented contiguously in memory, and within these planes the x lines are represented contiguously. In other words, if the temperature of cell (i,j,k) is in T[u], then: CourseNana.COM

• T [u + 1] contains the temperature of the cell (i + 1, j, k);
• T [u + n] contains the temperature of the cell (i, j + 1, k);
• T [u + nm] contains the temperature of the cell (i, j, k + 1). CourseNana.COM

The temperature of the cell (i, j, k) is therefore stored in T [i + nj + knm]. The simulation works according to the pseudo-code in figure 5. CourseNana.COM

3 What you Have to Do CourseNana.COM

We provide you with a heatsink.c sequential C program that runs the simulation and dumps the end result to the standard output. We also provide a few python scripts that do visual renderings of the result. CourseNana.COM

Page 5 CourseNana.COM

1: 2: 3: 4: 5: 6: 7: 8: 9: CourseNana.COM

10: 11: 12: 13: 14: 15: 16: CourseNana.COM

17: CourseNana.COM

18: 19: 20: 21: 22: 23: 24: 25: 26: 27: CourseNana.COM

procedure Simulation(...)
Allocate two arrays R and T of size nmo T [:] ← 293.15
t←0
convergence ← 0
while convergence = 0 do CourseNana.COM

for 0 ≤ k < o do
for 0 ≤ j < m do CourseNana.COM

for 0 ≤ i < n do
R[knm + jn + i] ← UpdateTemp(i, j, k) CourseNana.COM

▷ all cells are initially at 20 ◦C CourseNana.COM

▷ Processes the kth xy-plane CourseNana.COM

end for end for CourseNana.COM

end for
if t is an integer then CourseNana.COM

Tmax = max R[:] omn CourseNana.COM

ε = X (R[u] − T [u])2 u=0 CourseNana.COM

√ CourseNana.COM

if ε/∆t < 0.1 then convergence ← 1 CourseNana.COM

end if CourseNana.COM

Print t and Tmax to keep the user waiting. . . end if CourseNana.COM

t ← t + ∆t CourseNana.COM

T←R end while CourseNana.COM

Save T in a file for the forthcoming graphical rendering end procedure CourseNana.COM

Figure 5: Principle of the simulation CourseNana.COM

▷ Accesses to up to 6 neighbor cells CourseNana.COM

Page 6 CourseNana.COM

Your primary goal is to run the simulation in “CHALLENGE” mode, as fast as possible. CourseNana.COM

There is only one rule: the use of OpenMP is forbidden. You may use the external libraries that are usually available in computing centers. We expect that you end up with a parallel program capable of running on more than one compute node. CourseNana.COM

Some things that you can do: CourseNana.COM

Try to use non-blocking communications to overlap communications with computation. Does it bring any improvement? (if so, how much, if not, why?) CourseNana.COM
Try to experiment with several kinds of domain decomposition and choose the best one (e.g. 1D, 2D, 3D). CourseNana.COM
Try to provide a theoretical model of the performance of your code, in terms of processing and network speed; check this model against reality. CourseNana.COM
Try to implement some form of checkpointing — the program has to save periodical checkpoints ; if it is interrupted, then it can restart from the last checkpoint. CourseNana.COM

Whatever you do, you must : CourseNana.COM

Describe what you have done. CourseNana.COM
Measure the performance of your implementation and its scalability (both strong- and weak-scaling are fair game). CourseNana.COM
Comment these results: are they good or not? If not, why? Can you design an experiment that would confirm your opinion? Would it have been possible to tell in advance what has happened? CourseNana.COM

Your work must be submitted by Sunday, November 12th before 23:59 (using Moodle). You must submit your source code, a Makefile that compiles your code (without errors nor warnings, even with -Wall), and a ≈ 5 pages report in .PDF format that explains what you have done. CourseNana.COM

Please follow our guidelines about writing reports (on Moodle). CourseNana.COM

You must work in pairs (submit one report with both names) CourseNana.COM

You MUST respect the grid5000 usage policy. Do big computations at night. CourseNana.COM

If you believe that you have found an error in our programs (it does happen), or if you and your classmates are facing a common technical problem, don’t hesitate to contact us. CourseNana.COM

Assignment: Heat diffusion

Get in Touch with Our Experts