Example 5: Multi-Objective Exploration¶
When multiple aspects of representativeness matter simultaneously, a single weighted score can hide important trade-offs. This notebook sets up a 4-component objective and compares two selection policies:
- ParetoMaxMinStrategy: finds the Pareto-optimal solution that maximizes the worst-performing objective (conservative, balanced)
- WeightedSumPolicy: collapses all objectives into a single scalar via weighted sum (simpler, but requires choosing weights a priori)
The same search is run twice — the only difference is the policy that picks the winner from the scored candidates.
import pandas as pd
import energy_repset as rep
import energy_repset.diagnostics as diag
import plotly.io as pio; pio.renderers.default = 'notebook_connected'
url = "https://tubcloud.tu-berlin.de/s/pKttFadrbTKSJKF/download/time-series-lecture-2.csv"
df_raw = pd.read_csv(url, index_col=0, parse_dates=True).rename_axis('variable', axis=1)
df_raw = df_raw.drop('prices', axis=1)
slicer = rep.TimeSlicer(unit="month")
context = rep.ProblemContext(df_raw=df_raw, slicer=slicer)
feature_pipeline = rep.FeaturePipeline(engineers={
'stats': rep.StandardStatsFeatureEngineer(),
'pca': rep.PCAFeatureEngineer(),
})
Rich objective set: 4 components¶
Each component captures a different dimension of representativeness:
| Component | What it measures | Direction |
|---|---|---|
| Wasserstein | Marginal distribution similarity | minimize |
| Correlation | Cross-variable dependency preservation | minimize |
| Duration curve | Duration curve NRMSE (load-ordered fidelity) | minimize |
| Diversity | Spread of selection in feature space | maximize |
The first three are fidelity metrics (lower = better match to the full year). Diversity is a coverage metric (higher = more spread). This tension is intentional: pure fidelity optimization tends to pick "average" months, while diversity pushes toward distinct ones.
objective_set = rep.ObjectiveSet({
'wasserstein': (1.0, rep.WassersteinFidelity()),
'correlation': (1.0, rep.CorrelationFidelity()),
'duration_curve': (1.0, rep.DurationCurveFidelity()),
'diversity': (0.5, rep.DiversityReward()),
})
k = 3
combi_gen = rep.ExhaustiveCombiGen(k=k)
representation_model = rep.UniformRepresentationModel()
Run A: ParetoMaxMinStrategy¶
search_pareto = rep.ObjectiveDrivenCombinatorialSearchAlgorithm(
objective_set, rep.ParetoMaxMinStrategy(), combi_gen
)
workflow_pareto = rep.Workflow(feature_pipeline, search_pareto, representation_model)
experiment_pareto = rep.RepSetExperiment(context, workflow_pareto)
result_pareto = experiment_pareto.run()
print(f"Selection: {result_pareto.selection}")
print(f"Scores: {result_pareto.scores}")
Iterating over combinations: 100%|██████████| 220/220 [00:01<00:00, 200.24it/s]
Selection: (Period('2015-07', 'M'), Period('2015-10', 'M'), Period('2015-11', 'M'))
Scores: {'wasserstein': 0.17951351867468845, 'correlation': 0.04652147452533339, 'nrmse_duration_curve': 0.24354065585430432, 'diversity': 13.23658581042483}
Run B: WeightedSumPolicy¶
We reuse the already-computed features to skip redundant work.
search_weighted = rep.ObjectiveDrivenCombinatorialSearchAlgorithm(
objective_set, rep.WeightedSumPolicy(normalization='robust_minmax'), combi_gen
)
workflow_weighted = rep.Workflow(feature_pipeline, search_weighted, representation_model)
experiment_weighted = rep.RepSetExperiment(experiment_pareto.feature_context, workflow_weighted)
result_weighted = experiment_weighted.run()
print(f"Selection: {result_weighted.selection}")
print(f"Scores: {result_weighted.scores}")
Iterating over combinations: 100%|██████████| 220/220 [00:01<00:00, 188.30it/s]
Selection: (Period('2015-07', 'M'), Period('2015-10', 'M'), Period('2015-11', 'M'))
Scores: {'wasserstein': 0.17951351867468845, 'correlation': 0.04652147452533339, 'nrmse_duration_curve': 0.24354065585430432, 'diversity': 13.23658581042483}
same = result_pareto.selection == result_weighted.selection
print(f"Same selection? {same}")
Same selection? True
Pareto front visualization¶
The 2D scatter shows all 220 candidates in two-objective space. Pareto-optimal solutions (highlighted) form the efficient frontier — no solution dominates them on both axes.
fig = diag.ParetoScatter2D(
objective_x='wasserstein', objective_y='correlation'
).plot(search_algorithm=search_pareto, selected_combination=result_pareto.selection)
fig.update_layout(title='Pareto Front: Wasserstein vs Correlation')
fig.show()
The scatter matrix shows all pairwise objective trade-offs at once.
fig = diag.ParetoScatterMatrix().plot(
search_algorithm=search_pareto, selected_combination=result_pareto.selection
)
fig.update_layout(title='Pareto Scatter Matrix')
fig.show()
Score contributions: Pareto vs Weighted Sum¶
Comparing the normalized score profiles of the two winners reveals where they differ. The Pareto policy tends to produce more balanced profiles, while the weighted sum may sacrifice one objective for gains on others.
for label, res in [('Pareto', result_pareto), ('Weighted Sum', result_weighted)]:
fig = diag.ScoreContributionBars().plot(res.scores, normalize=True)
fig.update_layout(title=f'Score Contributions: {label}')
fig.show()
Weights comparison¶
for label, res in [('Pareto', result_pareto), ('Weighted Sum', result_weighted)]:
fig = diag.ResponsibilityBars().plot(res.weights, show_uniform_reference=True)
fig.update_layout(title=f'Weights: {label}')
fig.show()
Distribution and profile diagnostics (Pareto selection)¶
selected_indices = slicer.get_indices_for_slice_combi(df_raw.index, result_pareto.selection)
df_selection = df_raw.loc[selected_indices]
for var in df_raw.columns:
fig = diag.DistributionOverlayHistogram().plot(df_raw[var], df_selection[var], nbins=40)
fig.update_layout(title=f'Distribution Overlay: {var}')
fig.show()
fig = diag.DiurnalProfileOverlay().plot(
df_raw, df_selection, variables=list(df_raw.columns)
)
fig.update_layout(title='Diurnal Profiles: Full Year vs Selection')
fig.show()
fig = diag.CorrelationDifferenceHeatmap().plot(
df_raw, df_selection, method='pearson', show_lower_only=True
)
fig.update_layout(title='Correlation Difference: Selection - Full Year')
fig.show()
feature_context = experiment_pareto.feature_context
fig = diag.FeatureDistributions().plot(feature_context.df_features, nbins=20, cols=4)
fig.update_layout(title='Feature Distributions')
fig.show()