library(ggplot2)
library(dplyr)
if (!requireNamespace("energyR", quietly = TRUE)) {
if (file.exists("energyR/DESCRIPTION")) {
pkg_path <- normalizePath("energyR", mustWork = TRUE)
utils::install.packages(pkg_path, repos = NULL, type = "source")
} else {
stop("Package 'energyR' is required. Install it or render from the repository root.")
}
}
library(energyR)
Introduction
The LION paper analyzes the Brave New Algorithm (BNA), a
caste-stratified population-based stochastic optimization algorithm,
measuring fitness and energy jointly (Merelo-Guervós et al. 2026; Merelo, Merelo, and
Garcı́a-Valdez 2022). The optimization task in this study is the
BBOB Sphere function (Hansen et al.
2010).
In this context, baseline refers to runs that stop
after generating the initial population (same code path, no evolutionary
loop), computed per (dimension, population) configuration
and subtracted from workload runs (Merelo-Guervós
et al. 2026). This page condenses the core paper findings into
four visual messages:
- Baseline energy is noisy and must be discounted carefully.
- Within the same Julia version,
max_gens = 10 usually
saves energy.
- Lower energy and better fitness do not always happen together.
- Upgrading Julia can change the entire energy profile, even with the
same algorithm.
Data preparation with energyR
# LION (Julia 1.11.8)
lion_baseline <- load_bna_csv("data/lion-1.11.8-baseline.csv")
lion_summary <- summarize_baseline(lion_baseline)
lion_workload <- load_bna_csv("data/lion-1.11.8-bna-fix-rand.csv", drop_baseline_cols = FALSE)
lion_workload <- prepare_workload(lion_summary, lion_workload, label_col = "dim_pop")
lion_workload$julia <- "1.11.8"
# EvoApps baseline/workload used in the paper as the previous platform (Julia 1.11.7)
prev_baseline <- load_bna_csv("data/evoapps-1.11.7-baseline-bna-baseline-16-Oct-11-08-20.csv")
prev_summary <- summarize_baseline(prev_baseline)
prev_workload <- load_bna_csv("data/evoapps-1.11.7-fix-rand-bna-fix-rand-25-Oct-11-06-07.csv", drop_baseline_cols = FALSE)
prev_workload <- prepare_workload(prev_summary, prev_workload, label_col = "dim_pop")
prev_workload$julia <- "1.11.7"
1) Baseline is variable, so subtraction matters
Before looking at algorithmic effects, we need to separate them from
runtime and platform overhead. The baseline plot shows non-negligible
dispersion, and that dispersion depends on dimension and population
size.
This is one of the main methodological conclusions of the paper:
comparing raw Joules between configurations is misleading, so
baseline-corrected deltas are the defensible unit for comparisons (Merelo-Guervós et al. 2026; Cotta and Martı́nez-Cruz
2024).
plot_data <- lion_baseline %>%
mutate(
dimension = as.factor(dimension),
population_size = as.factor(population_size)
)
ggplot(plot_data, aes(x = dimension, y = PKG, fill = population_size)) +
geom_violin(alpha = 0.6, trim = FALSE) +
theme_minimal(base_size = 13) +
labs(
title = "Baseline runs have visible spread",
subtitle = "This is why LION computes delta energy against per-configuration baseline summaries",
x = "Dimension",
y = "PKG energy (J)",
fill = "Population"
)
2) Main LION configuration effect: stopping earlier usually saves
energy
With baseline correction in place, the effect of
max_gens becomes clearer: in most parameter combinations,
max_gens = 10 shifts the distribution downward relative to
larger values.
The practical conclusion is not “always use fewer generations.” The
paper shows that this setting is an energy-lean default, but it must be
interpreted jointly with solution quality and with problem-specific
accuracy requirements.
lion_for_plot <- lion_workload %>%
mutate(
work = paste0("max_gens=", max_gens),
dim_pop = factor(dim_pop)
)
plot_delta_energy(
lion_for_plot,
geom = "boxplot",
facet_col = "dim_pop",
title = "Delta energy by stopping criterion (LION / Julia 1.11.8)"
)
3) Energy vs fitness is a trade-off, not a single objective
This plot makes explicit what the paper discusses in detail: there is
no universal setting that simultaneously optimizes energy and fitness in
every case. Points do not collapse into a single dominant frontier.
Operationally, that means configuration should be policy-driven. If
your context prioritizes lower energy, choose from the lower-energy
clusters; if it prioritizes better fitness improvements, accept the
additional energy budget and select accordingly (Merelo-Guervós et al. 2026).
lion_tradeoff <- lion_for_plot %>%
filter(is.finite(log_diff), diff_fitness > 0) %>%
mutate(
work = factor(paste0("max_gens=", max_gens)),
dimension = factor(dimension),
population_size = factor(population_size)
)
ggplot(lion_tradeoff, aes(x = delta_PKG, y = log_diff, color = work, shape = population_size)) +
geom_point(alpha = 0.55, size = 2) +
facet_wrap(~dimension, nrow = 1, scales = "free_x") +
theme_minimal(base_size = 13) +
labs(
title = "LION: energy versus fitness by dimension",
subtitle = "Faceting by dimension clarifies the per-configuration max_gens effect",
x = "Delta PKG energy (J)",
y = "log10 fitness improvement",
color = "max_gens",
shape = "Population"
)
Compact configuration summary
wilcoxon_flags <- lion_workload %>%
filter(is.finite(delta_PKG)) %>%
group_by(dimension, population_size) %>%
group_modify(~{
gens <- sort(unique(.x$max_gens))
if (length(gens) < 2) {
return(data.frame(max_gens = numeric(0)))
}
pair_tests <- utils::combn(gens, 2, simplify = FALSE)
sig_rows <- lapply(pair_tests, function(pair) {
x <- .x$delta_PKG[.x$max_gens == pair[1]]
y <- .x$delta_PKG[.x$max_gens == pair[2]]
p_value <- stats::wilcox.test(x, y, exact = FALSE)$p.value
if (is.finite(p_value) && p_value < 0.05) {
data.frame(max_gens = pair)
} else {
NULL
}
})
sig_rows <- sig_rows[!vapply(sig_rows, is.null, logical(1))]
if (length(sig_rows) == 0) {
return(data.frame(max_gens = numeric(0)))
}
dplyr::bind_rows(sig_rows) %>% distinct(max_gens)
}) %>%
ungroup() %>%
distinct(dimension, population_size, max_gens) %>%
mutate(sig_mark = "*")
lion_summary_table <- create_summary(lion_workload) %>%
left_join(
wilcoxon_flags,
by = c("dimension", "population_size", "max_gens")
) %>%
arrange(trimmed_mean_delta_PKG) %>%
mutate(
setting = paste0(
"D=", dimension, ", P=", population_size, ", max_gens=", max_gens,
ifelse(is.na(sig_mark), "", sig_mark)
),
trimmed_mean_delta_PKG = round(trimmed_mean_delta_PKG, 2),
trimmed_mean_energy_per_evaluation = signif(trimmed_mean_energy_per_evaluation, 3)
) %>%
select(setting, trimmed_mean_delta_PKG, trimmed_mean_energy_per_evaluation)
knitr::kable(
lion_summary_table,
col.names = c("Setting", "Trimmed mean delta PKG (J)", "Trimmed mean J/evaluation"),
caption = "Lower values are better for energy-only efficiency. * marks configurations with Wilcoxon-significant (p < 0.05) delta-PKG differences versus at least one other max_gens in the same (dimension, population) group."
)
Lower values are better for energy-only efficiency. * marks
configurations with Wilcoxon-significant (p < 0.05) delta-PKG
differences versus at least one other max_gens in the same (dimension,
population) group.
| D=5, P=400, max_gens=10* |
128.97 |
0.001650 |
| D=5, P=400, max_gens=25* |
133.64 |
0.000844 |
| D=5, P=200, max_gens=25 |
134.23 |
0.001720 |
| D=5, P=200, max_gens=10 |
137.03 |
0.003620 |
| D=3, P=400, max_gens=25 |
137.38 |
0.000800 |
| D=3, P=200, max_gens=25 |
137.55 |
0.001620 |
| D=3, P=400, max_gens=10 |
140.42 |
0.001780 |
| D=3, P=200, max_gens=10 |
141.36 |
0.003570 |
The table is useful as an at-a-glance ranking, but it should be read
as a screening tool. Final decisions should combine this with fitness
outcomes and with the platform/version context highlighted above.
Discussion highlights from the paper
- Measurement methodology matters as much as algorithm
design: baseline correction and careful sampling choices
directly influence the conclusions you can trust (Merelo-Guervós et al. 2026; Cotta and Martı́nez-Cruz
2024).
- Energy is a first-class optimization objective: if
it is not explicitly included in analysis, tuning decisions can look
good on fitness while being unnecessarily expensive energetically (Merelo-Guervós et al. 2026).
- Results are contextual, not absolute: hardware
state, runtime version, and execution environment can dominate small
parameter effects (Merelo-Guervós et al. 2026;
Cotta and Martı́nez-Cruz 2024).
- Reproducibility requires protocol, not just code:
to compare studies over time, rerun baseline and workload suites under
each relevant software/hardware change (Merelo-Guervós et al. 2026).
- Actionable takeaway: report both fitness and
energy, and present trade-off-aware recommendations instead of
one-number “best setting” claims (Merelo-Guervós
et al. 2026; Merelo, Merelo, and Garcı́a-Valdez 2022).
References
Cotta, Carlos, and Jesús Martı́nez-Cruz. 2024.
“Energy Consumption
Analysis of Batch Runs of Evolutionary Algorithms.” In
Proceedings of the Genetic and Evolutionary Computation Conference
Companion, 87–88. GECCO ’24 Companion. New York, NY, USA:
Association for Computing Machinery.
https://doi.org/10.1145/3638530.3664093.
Hansen, Nikolaus, Anne Auger, Raymond Ros, Steffen Finck, and Petr
Pošı́k. 2010. “Comparing Results of 31 Algorithms from the
Black-Box Optimization Benchmarking BBOB-2009.” In
Proceedings of the 12th Annual Conference Companion on Genetic and
Evolutionary Computation, 1689–96.
Merelo, Cecilia, Juan J Merelo, and Mario Garcı́a-Valdez. 2022. “A
Brave New Algorithm to Maintain the Exploration/Exploitation
Balance.” In New Perspectives on Hybrid Intelligent System
Design Based on Fuzzy Logic, Neural Networks and Metaheuristics,
305–16. Springer.
Merelo-Guervós, Juan J., Cecilia Merelo-Molina, Pablo García-Sánchez,
and Mario García-Valdez. 2026. “Is There a (Carbon-) Free Lunch?
Energy/Performance Tradeoffs in Population-Based Metaheuristics.”