The Hidden Cost of Evaluating Solutions

Every time an evolutionary algorithm evaluates a candidate solution, it runs a fitness function — usually dozens of thousands of times per experiment, and thousands of experiments over a research career. We rarely stop to ask: how much energy does that actually cost?

In a world where software is responsible for a growing share of global electricity consumption, understanding and minimising the energy footprint of our algorithms is no longer optional. It is part of good engineering — and good science.

In our paper published at EvoStar 2025, we asked a deceptively simple question:

Among the standard BBOB floating-point fitness functions, how much energy does each one consume — and do the representation choices we make (precision, data structure) actually matter?

The answers were surprising.


How We Measured It

We used pinpoint, a multiplatform tool that hooks into the CPU’s built-in power sensors (Intel/AMD RAPL) and reports the total energy consumed by the package — CPU cores plus memory — during a process run. This is a hardware-level measurement, not a simulation.

The measurement procedure:

  1. Baseline: generate 40 000 random floating-point chromosomes and record the energy (PKG, in Joules) consumed. This represents the unavoidable overhead of just having a population.
  2. Function evaluation: apply each BBOB fitness function to the same 40 000 chromosomes and record energy again.
  3. Delta: subtract the baseline mean to isolate the energy cost of the function itself (ΔE = PKG_function − mean(PKG_baseline)).

We repeated every configuration 30 times on a dedicated AMD Ryzen 9 3950X machine (Ubuntu 20.04, g++ 10.5, -O3 -march=native -flto), varying:

  • Precision: float (32-bit) vs double (64-bit)
  • Chromosome size: 128, 256 and 512 dimensions
  • Data structure: variable-size std::vector vs fixed-size std::array

Ten BBOB functions were chosen to span all structural categories: sphere, rastrigin, rosenbrock, discus, bent_cigar, different_powers, sharp_ridge, schaffers, schwefel, and katsuura.


Finding 1: The Energy Landscape Spans Orders of Magnitude

**Figure 1.** Median energy consumption (ΔE, Joules, log scale) of each BBOB fitness function for 40 000 evaluations, by chromosome size and floating-point precision. Functions are ranked by their median energy at size 512 with double precision. The y-axis is on a base-10 log scale — each tick is a 10× increase.

Figure 1. Median energy consumption (ΔE, Joules, log scale) of each BBOB fitness function for 40 000 evaluations, by chromosome size and floating-point precision. Functions are ranked by their median energy at size 512 with double precision. The y-axis is on a base-10 log scale — each tick is a 10× increase.

The first thing that jumps out from Figure 1 is the extraordinary spread in energy consumption. Using a logarithmic scale is not a stylistic choice — it is a necessity: katsuura consumes roughly one order of magnitude more energy than most other functions at the same chromosome size and precision.

To put that in concrete terms:

Table 1. Energy cost per batch of 40 000 evaluations at chromosome size 512.
Function Precision Median ΔE (J) 75th pctile ΔE (J)
discus double (64-bit) 2.5883 2.8358
bent_cigar double (64-bit) 2.7683 2.9883
rosenbrock double (64-bit) 2.8583 2.9583
sphere double (64-bit) 2.8583 3.0658
sharp_ridge double (64-bit) 2.9483 3.1108
schwefel double (64-bit) 15.6433 16.0533
different_powers double (64-bit) 18.2133 18.8208
rastrigin double (64-bit) 22.4233 24.5783
schaffers double (64-bit) 72.6483 73.5183
katsuura double (64-bit) 737.1733 744.5408
rosenbrock float (32-bit) 1.5943 1.6843
sharp_ridge float (32-bit) 1.6793 1.7743
discus float (32-bit) 1.8843 2.0393
sphere float (32-bit) 2.1093 2.2318
bent_cigar float (32-bit) 4.4343 4.6943
different_powers float (32-bit) 7.7843 8.0093
schwefel float (32-bit) 8.2243 8.5643
rastrigin float (32-bit) 23.4393 23.7168
schaffers float (32-bit) 72.6743 73.6918
katsuura float (32-bit) 493.1793 500.2368

Some functions (bent_cigar, discus, rosenbrock at small sizes) show zero measurable delta. This is not a mistake — the compiler’s aggressive optimiser folds these lightweight functions into the chromosome generation loop, making them energetically inseparable from the baseline. That is good news for users of those functions, but it also means RAPL-based measurement hits a precision floor of roughly 5% of total PKG consumption.


Finding 2: Double Precision Is Not the Enemy

The conventional wisdom says “use floats to save energy” — halving the byte width should halve the memory traffic and therefore the energy, right?

Not quite.

**Figure 2.** Head-to-head comparison of float vs double at chromosome size 512 (variable-size vector). A position above the diagonal means double costs more; below means float costs more. Most functions cluster near the diagonal or below it.

Figure 2. Head-to-head comparison of float vs double at chromosome size 512 (variable-size vector). A position above the diagonal means double costs more; below means float costs more. Most functions cluster near the diagonal or below it.

In Figure 2, points on the diagonal indicate identical energy cost for both precisions. Points below the diagonal mean float is more expensive than double — the opposite of what intuition suggests.

Why? Modern compilers, when instructed with -O3 -march=native, generate vectorised 64-bit SIMD code that processes doubles very efficiently. Reducing to 32-bit floats does not automatically unlock a cheaper code path; in many cases the same instructions are used, but the shorter float values have to be zero-extended or handled with narrower SIMD widths, negating the theoretical gain. The only function where float consistently wins is katsuura, where the sheer amount of trigonometric and power operations means fewer bits do translate to measurably less computation.

Practical takeaway: there is no blanket energy argument for switching your EA from double to float. The theoretical gain rarely materialises in practice, and you lose several digits of precision.


Finding 3: Where Arrays Really Shine — and Where They Don’t

We also tested a fixed-size std::array<T, 128> against the usual variable-size std::vector<T>.

**Figure 3.** Energy consumed by chromosome *generation* alone (the unavoidable baseline), comparing fixed-size `array` vs variable-size `vector` at size 128. Both float and double are shown. The array baseline is roughly 8× cheaper.

Figure 3. Energy consumed by chromosome generation alone (the unavoidable baseline), comparing fixed-size array vs variable-size vector at size 128. Both float and double are shown. The array baseline is roughly 8× cheaper.

The array baseline is approximately 8× cheaper than the vector baseline (Figure 3). This is significant: every evolutionary algorithm spends time at initialisation creating chromosomes, and if those chromosomes are fixed-size, the savings at startup are real and large.

However, for function evaluation the picture is murkier:

**Figure 4.** ΔE for function evaluation at size 128, comparing array and vector data structures. Double precision shown; the results for float are analogous. For most functions the two data structures are equivalent — the `katsuura` exception aside.

Figure 4. ΔE for function evaluation at size 128, comparing array and vector data structures. Double precision shown; the results for float are analogous. For most functions the two data structures are equivalent — the katsuura exception aside.

For most functions there is no meaningful difference in evaluation energy between the two structures. The compiler emits essentially the same inner loop regardless of whether the data pointer comes from a vector or an array. The conclusion is clear:

Use double + vector as your default. It is the combination that requires no compile-time size specialisation and consistently delivers the lowest or tied-lowest evaluation energy across functions. Switch to array only if your algorithm initialises large populations frequently — the 8× saving at chromosome-generation time can add up over an evolutionary run.


Going Further: Power Profile Across Functions

The paper’s future work section mentions a need to look beyond raw Joules and consider average power (watts = J/s), which characterises how intensively a function taxes the CPU rather than just how long it runs. We can do this directly with the existing data.

**Figure 5.** Average power draw (watts = PKG / seconds) for each BBOB function at size 512, double precision. This reveals which functions are *computationally intense* regardless of how long they take. Functions are ordered by median power.

Figure 5. Average power draw (watts = PKG / seconds) for each BBOB function at size 512, double precision. This reveals which functions are computationally intense regardless of how long they take. Functions are ordered by median power.

Figure 5 reveals something the raw-energy view obscures: while katsuura still tops the ranking, the relative ordering of the other functions changes. A function that consumes modest total Joules but finishes quickly may draw more instantaneous watts than one that consumes more Joules over a longer run. This matters for thermal management and for cloud environments billed by peak power.


Energy Scaling with Chromosome Size

One more analysis the existing data allows us to perform is to see how energy scales as chromosome size grows from 128 to 256 to 512 dimensions. Theory predicts linear scaling for simple element-wise functions, but loop-heavy or transcendental functions might scale super-linearly.

**Figure 6.** Median ΔE as a function of chromosome size, for each BBOB function (double precision, variable vector). Lines connect medians across sizes. Functions with near-zero measurements at size 128 are excluded from the log scale.

Figure 6. Median ΔE as a function of chromosome size, for each BBOB function (double precision, variable vector). Lines connect medians across sizes. Functions with near-zero measurements at size 128 are excluded from the log scale.

The scaling plot shows that all functions grow roughly linearly in energy with chromosome size on this log-log scale — that is, a doubling of size leads to roughly a doubling of energy. There is no evidence of any function becoming disproportionately expensive at larger sizes; the rank order is stable. This is reassuring: measurements taken at size 128 give a reliable relative picture even for larger problems.


Summary: What Should You Actually Do?

Decision Recommendation Confidence
float vs double Use double High — float rarely saves energy and costs precision
vector vs array Use vector for evaluation; array for init-heavy workloads High
Which functions to benchmark? Prefer katsuura, rastrigin, schaffers — they are measurable High
Chromosome size Measurements at 128 dimensions generalise to larger sizes Medium
Power vs energy Monitor both; they do not always rank functions the same way Medium

Open Questions and Future Work

The paper identified several open questions that remain fertile ground for further investigation:

  1. Compiler-fused functions: bent_cigar, discus, rosenbrock (at small sizes) produce zero delta energy because the compiler merges them with the generation loop. Techniques like volatile fencing or separate compilation units could make these measurable.

  2. Reduced precision beyond float: The transprecision computing literature suggests 8-bit and 16-bit floating-point formats can cut energy by up to 30% in some workloads. Could BBOB functions tolerate such reduced precision? The answer likely varies by function — katsuura with its iterated products might be sensitive, while sphere could be fine.

  3. Other BBOB functions: 14 of the 24 BBOB functions remain unmeasured. Do any of them approach katsuura’s extreme energy footprint?

  4. Other architectures: All results here are for an AMD Ryzen on Linux. Intel, ARM and RISC-V CPUs vectorise differently, and the float-vs-double story may be completely different on an M-series Apple chip.

  5. Algorithm-level impact: The paper measures function calls in isolation. A full evolutionary algorithm adds selection, crossover, mutation, and population management. How does function energy cost translate to total run energy?


References

This post is based on:

Merelo-Guervós, J. J., Romero López, G., & García-Valdez, M. (2025). Measuring Energy Consumption of BBOB Fitness Functions. In Applications of Evolutionary Computation (EvoStar 2025), Lecture Notes in Computer Science, vol. 15613, pp. 240–254. Springer. https://doi.org/10.1007/978-3-031-90065-5_15

Source code and data: https://github.com/JJ/energy-bbob (GPL licence).