This research focuses on the Brave New Algorithm (BNA) applied to the Sphere function to establish a methodology for profiling energy consumption. We use the Intel RAPL interface to measure the package-level energy (PKG) in Joules, seeking to isolate the actual workload from system overhead.
In this context, the baseline refers to the energy consumed by the system while idle or running only background processes. The delta PKG represents the estimated energy consumption of the algorithm itself, calculated by subtracting the baseline from the total measured energy. The parameter max_gens (maximum number of generations without improvement) serves as the stopping criterion for the evolutionary process.
Energy efficiency is becoming a critical engineering requirement for metaheuristics, yet measuring it accurately on modern multi-tasking systems is challenging. Modern processors employ dynamic power management that introduces high variability into measurements. This explainer details a methodology to extract reliable energy profiles using experimental design and statistical processing.
Does background energy consumption remain constant enough for a single baseline measurement?
Figure 2.1: PKG energy measured over time for two separate baseline measurement sets.
The baseline energy levels shift significantly over time and between different experimental sessions, sometimes by as much as 100 Joules. Even within a single session, background noise is not uniform, which can lead to “negative” energy estimates if a static baseline is subtracted from a workload run during a low-activity period.
Takeaway: Static baselines are insufficient because system state is non-stationary; measurements must account for temporal drift.
How can we mitigate temporal drift in energy measurements?
Figure 3.1: PKG energy measured vs. time for workload measurements. Dashed lines indicate baseline averages; colors distinguish configurations.
But this effectively leads to a confusion of the energy measured for different parametrizations, which are almost impossible to distinguish.
Figure 3.2: Figure 3: Workload measurements (processed delta energy).
The sequential mixed strategy proposes workload and baseline in a single run, making the state in which the two processes are run close enough to be effectively the same (most of the times, at least)
Figure 4.1: Sequential baseline and workload measurements showing tracking of system state.
The correlation between the energy yielded by baseline and workload runs is now 0.5877478, which is a moderate correlation. In general, baseline and workload will change in the same way, implying that they are tracking system state.
Takeaway: By interleaving baseline and workload runs in a sequential manner, we ensure that both measurements capture similar system states, effectively cancelling out long-term temporal drift.
Our initial assumption is that this would dampen noise. Let’s see
Figure 4.2: Workload distribution in this case
We some more differentiation still, a few runs occupying the “negative” area, but there is still some confusion and little differentiation for different parameters.
Takeaway: It’s a bit better, and it leaves room for more variance explained by changes in the algorithm or its parameters. But the main issue is that system state is not totally eliminated by subtracting baseline.
Can multiple runs across different system states yield an energy profile that allows us to compare between different algorithm configurations and representations?
Let us try and repeat the experiments five times; measurements of energy for every parameter configuration are then repeated 30 x 5 times. We will be also using a sandwich strategy: every workload experiment will be sandwiched between two baseline experiments.
Figure 5.1: Energy consumption still depends on time
This is a plot of the dependence of energy consumption on running time. Running time is important, because it is one of the variables where system state shows; as frequency shifts the workload will take more or less time; that might be a consequence of the system moving workload to lower-frequency cores or a dynamic change in voltage that shifts frequency.
We have used a two-stage model here. Check this explanation on the two-stage process by which we first model time to separate its dependence on the algorithm variables from the unexplained variance, and then energy.
The created model (you can check it out by clicking on the show code button) points towards that direction, with a positive coefficient on the seconds term and a negative one on the squared seconds term; this is clearly shown in this chart. There is still some confusion between problems of different dimensions and population sizes, but this is also because one of the most significant variables, max_gens is not represented.
Takeaway: Since running time is the most important factor in energy consumption, even if an important part is governed by operating conditions, there is still a part we can act on to reduce energy consumption: increase performance by making the algorithm run faster.
But we already knew that. The main thing is that by making 5 experiment blocks, we have reduced uncertainty
With a single experiment block, the precision we could achieve was 6.53 Joules. No change under that could be detected. With 5 blocks, we have boosted it to 3.93 Joules. Even if the difference between the energy spent by two values of the parameter is this small, we’ll be able to detect it.
We can look at it from another point of view, looking at the total variance of coefficients. From the 1-block model we used before to this one we have achieved an improvement of 64.23 %.
Takeaway: Since the operating context is unstable, a single experiment block is not enough to achieve enough precision to detect the impact of different variable values on energy. We need several experiment blocks.
How does the distribution of delta energy differ across configurations?
Figure 6.1: Distribution of delta energy for different configurations.
The main difference is how different population sizes fall across the two “energy peaks”. In the cases where energy distribution fall across two peaks, we will see higher energy consumption. The peaks, however, are slightly displaced with respect to each other to this higher precision achieved gives us a bit of leeway to choose among different configurations for the same problem size, or to distinguish among different algorithm implementations.
Takeaway: Since we have a good amount of data, we can distinguish now between relatively small energy deltas, giving us more leverage to configure our algorithms in a greener way.
This methodology allows researchers to treat energy consumption as a first-class optimization objective. By identifying that smaller populations and shorter stagnation limits (max_gens = 10) consume significantly less energy across all tested dimensions, we can make informed trade-offs between solution quality and carbon footprint.
All experiments were conducted on an AMD Ryzen 9 9950X system using the pinpoint tool for RAPL access. The analysis code and datasets are available in the project repository (Merelo, Merelo, and Garcı́a-Valdez 2022).