Example 4: Comparing Representation Models¶
The Representation Model (pillar R) determines how selected periods stand in for the full year. This notebook runs a single search to find the best 3-month selection, then applies three different representation models to the same selection:
| Model | How it works | Weight distribution |
|---|---|---|
| Uniform | Each period = 1/k | Equal bars |
| KMedoids cluster-size | Weight = fraction of months closest to this representative | Unequal — popular representatives get higher weight |
| Blended (soft assignment) | Each original month is a weighted combination of all representatives | Full weight matrix, not just one weight per representative |
The choice of R does not change which months are selected — only how they are weighted in the downstream model.
import pandas as pd
import plotly.express as px
import energy_repset as rep
import energy_repset.diagnostics as diag
import plotly.io as pio; pio.renderers.default = 'notebook_connected'
url = "https://tubcloud.tu-berlin.de/s/pKttFadrbTKSJKF/download/time-series-lecture-2.csv"
df_raw = pd.read_csv(url, index_col=0, parse_dates=True).rename_axis('variable', axis=1)
df_raw = df_raw.drop('prices', axis=1)
slicer = rep.TimeSlicer(unit="month")
context = rep.ProblemContext(df_raw=df_raw, slicer=slicer)
Find the best 3-month selection¶
We use PCA features and a weighted-sum policy with robust min-max normalization (so different score components are on comparable scales).
feature_pipeline = rep.FeaturePipeline(engineers={
'stats': rep.StandardStatsFeatureEngineer(),
'pca': rep.PCAFeatureEngineer(),
})
k = 3
objective_set = rep.ObjectiveSet({
'wasserstein': (1.0, rep.WassersteinFidelity()),
'correlation': (1.0, rep.CorrelationFidelity()),
})
policy = rep.WeightedSumPolicy(normalization='robust_minmax')
search_algorithm = rep.ObjectiveDrivenCombinatorialSearchAlgorithm(
objective_set, policy, rep.ExhaustiveCombiGen(k=k)
)
# Run with uniform weights to get the selection
workflow = rep.Workflow(feature_pipeline, search_algorithm, rep.UniformRepresentationModel())
experiment = rep.RepSetExperiment(context, workflow)
result = experiment.run()
selection = result.selection
print(f"Selected months: {selection}")
print(f"Scores: {result.scores}")
Iterating over combinations: 100%|██████████| 220/220 [00:00<00:00, 227.39it/s]
Selected months: (Period('2015-01', 'M'), Period('2015-04', 'M'), Period('2015-09', 'M'))
Scores: {'wasserstein': 0.20025433806827211, 'correlation': 0.0231217738568774}
Apply three representation models to the same selection¶
feature_context = experiment.feature_context
# Model A: Uniform — 1/k each
uniform_model = rep.UniformRepresentationModel()
uniform_model.fit(feature_context)
weights_uniform = uniform_model.weigh(selection)
# Model B: KMedoids cluster-size — proportional to cluster membership
kmedoids_model = rep.KMedoidsClustersizeRepresentation()
kmedoids_model.fit(feature_context)
weights_kmedoids = kmedoids_model.weigh(selection)
# Model C: Blended (soft assignment) — weight matrix
blended_model = rep.BlendedRepresentationModel(blend_type='convex')
blended_model.fit(feature_context)
weights_blended_df = blended_model.weigh(selection)
Weight comparison table¶
print(f"{'Month':<12} {'Uniform':>10} {'KMedoids':>10}")
print("-" * 34)
for s in selection:
print(f"{str(s):<12} {weights_uniform[s]:>10.3f} {weights_kmedoids[s]:>10.3f}")
# Aggregate blended weights to one value per representative
blended_col_sums = weights_blended_df.sum(axis=0)
weights_blended_agg = (blended_col_sums / blended_col_sums.sum()).to_dict()
print(f"\nBlended (aggregated): {weights_blended_agg}")
Month Uniform KMedoids
----------------------------------
2015-01 0.333 0.250
2015-04 0.333 0.333
2015-09 0.333 0.417
Blended (aggregated): {Period('2015-01', 'M'): 0.28509640040626255, Period('2015-04', 'M'): 0.3254398531847014, Period('2015-09', 'M'): 0.389463746409036}
Responsibility bars: side by side¶
The uniform model produces equal bars. KMedoids assigns more weight to representatives that are "closest" to more months. The blended model distributes responsibility more smoothly.
models = {
'Uniform': weights_uniform,
'KMedoids': weights_kmedoids,
'Blended (aggregated)': weights_blended_agg,
}
for label, weights in models.items():
fig = diag.ResponsibilityBars().plot(weights, show_uniform_reference=True)
fig.update_layout(title=f'Responsibility Weights: {label}')
fig.show()
Blended weight matrix¶
The heatmap shows the full weight matrix: how much each original month (columns) relies on each representative (rows). In the blended model, every month is a weighted mix of all three representatives — not assigned to just one.
heatmap_df = weights_blended_df.copy()
heatmap_df.index = heatmap_df.index.astype(str)
heatmap_df.columns = heatmap_df.columns.astype(str)
fig = px.imshow(
heatmap_df.T,
labels=dict(x='Original Month', y='Representative', color='Weight'),
color_continuous_scale='Blues',
aspect='auto',
title='Blended Weight Matrix',
)
fig.show()
Feature space and distribution fidelity¶
fig = diag.FeatureSpaceScatter2D().plot(
feature_context.df_features, x='pc_0', y='pc_1', selection=selection
)
fig.update_layout(title='Feature Space with Selection')
fig.show()
selected_indices = slicer.get_indices_for_slice_combi(df_raw.index, selection)
df_selection = df_raw.loc[selected_indices]
for var in df_raw.columns:
fig = diag.DistributionOverlayECDF().plot(df_raw[var], df_selection[var])
fig.update_layout(title=f'ECDF Overlay: {var}')
fig.show()