We used pinpoint, a multiplatform tool that hooks into the CPU’s built-in power sensors (Intel/AMD RAPL) and reports the total energy consumed by the package — CPU cores plus memory — during a process run. This is a hardware-level measurement, not a simulation.
The measurement procedure:
PKG, in Joules)
consumed. This represents the unavoidable overhead of just
having a population.ΔE = PKG_function − mean(PKG_baseline)).We repeated every configuration 30 times on a
dedicated AMD Ryzen 9 3950X machine (Ubuntu 20.04, g++ 10.5,
-O3 -march=native -flto), varying:
float (32-bit) vs
double (64-bit)std::vector vs fixed-size std::arrayTen BBOB functions were chosen to span all structural categories:
sphere, rastrigin, rosenbrock,
discus, bent_cigar,
different_powers, sharp_ridge,
schaffers, schwefel, and
katsuura.
Figure 1. Median energy consumption (ΔE, Joules, log scale) of each BBOB fitness function for 40 000 evaluations, by chromosome size and floating-point precision. Functions are ranked by their median energy at size 512 with double precision. The y-axis is on a base-10 log scale — each tick is a 10× increase.
The first thing that jumps out from Figure 1 is the
extraordinary spread in energy consumption. Using a
logarithmic scale is not a stylistic choice — it is a necessity:
katsuura consumes roughly one order of magnitude
more energy than most other functions at the same chromosome
size and precision.
To put that in concrete terms:
| Function | Precision | Median ΔE (J) | 75th pctile ΔE (J) |
|---|---|---|---|
| discus | double (64-bit) | 2.5883 | 2.8358 |
| bent_cigar | double (64-bit) | 2.7683 | 2.9883 |
| rosenbrock | double (64-bit) | 2.8583 | 2.9583 |
| sphere | double (64-bit) | 2.8583 | 3.0658 |
| sharp_ridge | double (64-bit) | 2.9483 | 3.1108 |
| schwefel | double (64-bit) | 15.6433 | 16.0533 |
| different_powers | double (64-bit) | 18.2133 | 18.8208 |
| rastrigin | double (64-bit) | 22.4233 | 24.5783 |
| schaffers | double (64-bit) | 72.6483 | 73.5183 |
| katsuura | double (64-bit) | 737.1733 | 744.5408 |
| rosenbrock | float (32-bit) | 1.5943 | 1.6843 |
| sharp_ridge | float (32-bit) | 1.6793 | 1.7743 |
| discus | float (32-bit) | 1.8843 | 2.0393 |
| sphere | float (32-bit) | 2.1093 | 2.2318 |
| bent_cigar | float (32-bit) | 4.4343 | 4.6943 |
| different_powers | float (32-bit) | 7.7843 | 8.0093 |
| schwefel | float (32-bit) | 8.2243 | 8.5643 |
| rastrigin | float (32-bit) | 23.4393 | 23.7168 |
| schaffers | float (32-bit) | 72.6743 | 73.6918 |
| katsuura | float (32-bit) | 493.1793 | 500.2368 |
Some functions (bent_cigar, discus,
rosenbrock at small sizes) show zero measurable
delta. This is not a mistake — the compiler’s
aggressive optimiser folds these lightweight functions into the
chromosome generation loop, making them energetically inseparable from
the baseline. That is good news for users of those functions, but it
also means RAPL-based measurement hits a precision floor of roughly 5%
of total PKG consumption.
The conventional wisdom says “use floats to save energy” — halving the byte width should halve the memory traffic and therefore the energy, right?
Not quite.
Figure 2. Head-to-head comparison of float vs double at chromosome size 512 (variable-size vector). A position above the diagonal means double costs more; below means float costs more. Most functions cluster near the diagonal or below it.
In Figure 2, points on the diagonal indicate
identical energy cost for both precisions. Points below
the diagonal mean float is more expensive than
double — the opposite of what intuition suggests.
Why? Modern compilers, when instructed with
-O3 -march=native, generate vectorised 64-bit SIMD code
that processes doubles very efficiently. Reducing to 32-bit floats does
not automatically unlock a cheaper code path; in many cases the same
instructions are used, but the shorter float values have to be
zero-extended or handled with narrower SIMD widths, negating the
theoretical gain. The only function where float consistently
wins is katsuura, where the sheer amount of
trigonometric and power operations means fewer bits do translate to
measurably less computation.
Practical takeaway: there is no blanket energy argument for switching your EA from
doubletofloat. The theoretical gain rarely materialises in practice, and you lose several digits of precision.
We also tested a fixed-size std::array<T, 128>
against the usual variable-size std::vector<T>.
Figure 3. Energy consumed by chromosome
generation alone (the unavoidable baseline), comparing
fixed-size array vs variable-size vector at
size 128. Both float and double are shown. The array baseline is roughly
8× cheaper.
The array baseline is approximately 8× cheaper than the vector baseline (Figure 3). This is significant: every evolutionary algorithm spends time at initialisation creating chromosomes, and if those chromosomes are fixed-size, the savings at startup are real and large.
However, for function evaluation the picture is murkier:
Figure 4. ΔE for function evaluation at size 128,
comparing array and vector data structures. Double precision shown; the
results for float are analogous. For most functions the two data
structures are equivalent — the katsuura exception aside.
For most functions there is no meaningful difference
in evaluation energy between the two structures. The compiler emits
essentially the same inner loop regardless of whether the data pointer
comes from a vector or an array. The
conclusion is clear:
Use
double + vectoras your default. It is the combination that requires no compile-time size specialisation and consistently delivers the lowest or tied-lowest evaluation energy across functions. Switch toarrayonly if your algorithm initialises large populations frequently — the 8× saving at chromosome-generation time can add up over an evolutionary run.
The paper’s future work section mentions a need to look beyond raw Joules and consider average power (watts = J/s), which characterises how intensively a function taxes the CPU rather than just how long it runs. We can do this directly with the existing data.
Figure 5. Average power draw (watts = PKG / seconds) for each BBOB function at size 512, double precision. This reveals which functions are computationally intense regardless of how long they take. Functions are ordered by median power.
Figure 5 reveals something the raw-energy view obscures: while
katsuura still tops the ranking, the relative
ordering of the other functions changes. A function that
consumes modest total Joules but finishes quickly may draw more
instantaneous watts than one that consumes more Joules over a
longer run. This matters for thermal management and for cloud
environments billed by peak power.
One more analysis the existing data allows us to perform is to see how energy scales as chromosome size grows from 128 to 256 to 512 dimensions. Theory predicts linear scaling for simple element-wise functions, but loop-heavy or transcendental functions might scale super-linearly.
Figure 6. Median ΔE as a function of chromosome size, for each BBOB function (double precision, variable vector). Lines connect medians across sizes. Functions with near-zero measurements at size 128 are excluded from the log scale.
The scaling plot shows that all functions grow roughly linearly in energy with chromosome size on this log-log scale — that is, a doubling of size leads to roughly a doubling of energy. There is no evidence of any function becoming disproportionately expensive at larger sizes; the rank order is stable. This is reassuring: measurements taken at size 128 give a reliable relative picture even for larger problems.
| Decision | Recommendation | Confidence |
|---|---|---|
float vs double |
Use double |
High — float rarely saves energy and costs precision |
vector vs array |
Use vector for evaluation; array for
init-heavy workloads |
High |
| Which functions to benchmark? | Prefer katsuura, rastrigin,
schaffers — they are measurable |
High |
| Chromosome size | Measurements at 128 dimensions generalise to larger sizes | Medium |
| Power vs energy | Monitor both; they do not always rank functions the same way | Medium |
The paper identified several open questions that remain fertile ground for further investigation:
Compiler-fused functions:
bent_cigar, discus, rosenbrock
(at small sizes) produce zero delta energy because the compiler merges
them with the generation loop. Techniques like volatile
fencing or separate compilation units could make these
measurable.
Reduced precision beyond float: The
transprecision computing literature suggests 8-bit and 16-bit
floating-point formats can cut energy by up to 30% in some workloads.
Could BBOB functions tolerate such reduced precision? The answer likely
varies by function — katsuura with its iterated products
might be sensitive, while sphere could be fine.
Other BBOB functions: 14 of the 24 BBOB
functions remain unmeasured. Do any of them approach
katsuura’s extreme energy footprint?
Other architectures: All results here are for an AMD Ryzen on Linux. Intel, ARM and RISC-V CPUs vectorise differently, and the float-vs-double story may be completely different on an M-series Apple chip.
Algorithm-level impact: The paper measures function calls in isolation. A full evolutionary algorithm adds selection, crossover, mutation, and population management. How does function energy cost translate to total run energy?
This post is based on:
Merelo-Guervós, J. J., Romero López, G., & García-Valdez, M. (2025). Measuring Energy Consumption of BBOB Fitness Functions. In Applications of Evolutionary Computation (EvoStar 2025), Lecture Notes in Computer Science, vol. 15613, pp. 240–254. Springer. https://doi.org/10.1007/978-3-031-90065-5_15
Source code and data: https://github.com/JJ/energy-bbob (GPL licence).