Example 4: Comparing Representation Models¶

The Representation Model (pillar R) determines how selected periods stand in for the full year. This notebook runs a single search to find the best 3-month selection, then applies three different representation models to the same selection:

Model	How it works	Weight distribution
Uniform	Each period = 1/k	Equal bars
KMedoids cluster-size	Weight = fraction of months closest to this representative	Unequal — popular representatives get higher weight
Blended (soft assignment)	Each original month is a weighted combination of all representatives	Full weight matrix, not just one weight per representative

The choice of R does not change which months are selected — only how they are weighted in the downstream model.

In [1]:

Copied!





import pandas as pd
import plotly.express as px
import energy_repset as rep
import energy_repset.diagnostics as diag
import plotly.io as pio; pio.renderers.default = 'notebook_connected'
import pandas as pd
import plotly.express as px
import energy_repset as rep
import energy_repset.diagnostics as diag
import plotly.io as pio; pio.renderers.default = 'notebook_connected'

In [2]:

Copied!





url = "https://tubcloud.tu-berlin.de/s/pKttFadrbTKSJKF/download/time-series-lecture-2.csv"
df_raw = pd.read_csv(url, index_col=0, parse_dates=True).rename_axis('variable', axis=1)
df_raw = df_raw.drop('prices', axis=1)

slicer = rep.TimeSlicer(unit="month")
context = rep.ProblemContext(df_raw=df_raw, slicer=slicer)
url = "https://tubcloud.tu-berlin.de/s/pKttFadrbTKSJKF/download/time-series-lecture-2.csv"
df_raw = pd.read_csv(url, index_col=0, parse_dates=True).rename_axis('variable', axis=1)
df_raw = df_raw.drop('prices', axis=1)

slicer = rep.TimeSlicer(unit="month")
context = rep.ProblemContext(df_raw=df_raw, slicer=slicer)

Find the best 3-month selection¶

We use PCA features and a weighted-sum policy with robust min-max normalization (so different score components are on comparable scales).

In [3]:

Copied!





feature_pipeline = rep.FeaturePipeline(engineers={
    'stats': rep.StandardStatsFeatureEngineer(),
    'pca': rep.PCAFeatureEngineer(),
})

k = 3
objective_set = rep.ObjectiveSet({
    'wasserstein': (1.0, rep.WassersteinFidelity()),
    'correlation': (1.0, rep.CorrelationFidelity()),
})
policy = rep.WeightedSumPolicy(normalization='robust_minmax')
search_algorithm = rep.ObjectiveDrivenCombinatorialSearchAlgorithm(
    objective_set, policy, rep.ExhaustiveCombiGen(k=k)
)

# Run with uniform weights to get the selection
workflow = rep.Workflow(feature_pipeline, search_algorithm, rep.UniformRepresentationModel())
experiment = rep.RepSetExperiment(context, workflow)
result = experiment.run()

selection = result.selection
print(f"Selected months: {selection}")
print(f"Scores: {result.scores}")
feature_pipeline = rep.FeaturePipeline(engineers={
    'stats': rep.StandardStatsFeatureEngineer(),
    'pca': rep.PCAFeatureEngineer(),
})

k = 3
objective_set = rep.ObjectiveSet({
    'wasserstein': (1.0, rep.WassersteinFidelity()),
    'correlation': (1.0, rep.CorrelationFidelity()),
})
policy = rep.WeightedSumPolicy(normalization='robust_minmax')
search_algorithm = rep.ObjectiveDrivenCombinatorialSearchAlgorithm(
    objective_set, policy, rep.ExhaustiveCombiGen(k=k)
)

# Run with uniform weights to get the selection
workflow = rep.Workflow(feature_pipeline, search_algorithm, rep.UniformRepresentationModel())
experiment = rep.RepSetExperiment(context, workflow)
result = experiment.run()

selection = result.selection
print(f"Selected months: {selection}")
print(f"Scores: {result.scores}")

Iterating over combinations: 100%|██████████| 220/220 [00:00<00:00, 227.39it/s]

Selected months: (Period('2015-01', 'M'), Period('2015-04', 'M'), Period('2015-09', 'M'))
Scores: {'wasserstein': 0.20025433806827211, 'correlation': 0.0231217738568774}

Apply three representation models to the same selection¶

In [4]:

Copied!





feature_context = experiment.feature_context

# Model A: Uniform — 1/k each
uniform_model = rep.UniformRepresentationModel()
uniform_model.fit(feature_context)
weights_uniform = uniform_model.weigh(selection)

# Model B: KMedoids cluster-size — proportional to cluster membership
kmedoids_model = rep.KMedoidsClustersizeRepresentation()
kmedoids_model.fit(feature_context)
weights_kmedoids = kmedoids_model.weigh(selection)

# Model C: Blended (soft assignment) — weight matrix
blended_model = rep.BlendedRepresentationModel(blend_type='convex')
blended_model.fit(feature_context)
weights_blended_df = blended_model.weigh(selection)
feature_context = experiment.feature_context

# Model A: Uniform — 1/k each
uniform_model = rep.UniformRepresentationModel()
uniform_model.fit(feature_context)
weights_uniform = uniform_model.weigh(selection)

# Model B: KMedoids cluster-size — proportional to cluster membership
kmedoids_model = rep.KMedoidsClustersizeRepresentation()
kmedoids_model.fit(feature_context)
weights_kmedoids = kmedoids_model.weigh(selection)

# Model C: Blended (soft assignment) — weight matrix
blended_model = rep.BlendedRepresentationModel(blend_type='convex')
blended_model.fit(feature_context)
weights_blended_df = blended_model.weigh(selection)

Weight comparison table¶

In [5]:

Copied!





print(f"{'Month':<12} {'Uniform':>10} {'KMedoids':>10}")
print("-" * 34)
for s in selection:
    print(f"{str(s):<12} {weights_uniform[s]:>10.3f} {weights_kmedoids[s]:>10.3f}")

# Aggregate blended weights to one value per representative
blended_col_sums = weights_blended_df.sum(axis=0)
weights_blended_agg = (blended_col_sums / blended_col_sums.sum()).to_dict()
print(f"\nBlended (aggregated): {weights_blended_agg}")
print(f"{'Month':<12} {'Uniform':>10} {'KMedoids':>10}")
print("-" * 34)
for s in selection:
    print(f"{str(s):<12} {weights_uniform[s]:>10.3f} {weights_kmedoids[s]:>10.3f}")

# Aggregate blended weights to one value per representative
blended_col_sums = weights_blended_df.sum(axis=0)
weights_blended_agg = (blended_col_sums / blended_col_sums.sum()).to_dict()
print(f"\nBlended (aggregated): {weights_blended_agg}")

Month           Uniform   KMedoids
----------------------------------
2015-01           0.333      0.250
2015-04           0.333      0.333
2015-09           0.333      0.417

Blended (aggregated): {Period('2015-01', 'M'): 0.28509640040626255, Period('2015-04', 'M'): 0.3254398531847014, Period('2015-09', 'M'): 0.389463746409036}

Responsibility bars: side by side¶

The uniform model produces equal bars. KMedoids assigns more weight to representatives that are "closest" to more months. The blended model distributes responsibility more smoothly.

In [6]:

Copied!





models = {
    'Uniform': weights_uniform,
    'KMedoids': weights_kmedoids,
    'Blended (aggregated)': weights_blended_agg,
}

for label, weights in models.items():
    fig = diag.ResponsibilityBars().plot(weights, show_uniform_reference=True)
    fig.update_layout(title=f'Responsibility Weights: {label}')
    fig.show()
models = {
    'Uniform': weights_uniform,
    'KMedoids': weights_kmedoids,
    'Blended (aggregated)': weights_blended_agg,
}

for label, weights in models.items():
    fig = diag.ResponsibilityBars().plot(weights, show_uniform_reference=True)
    fig.update_layout(title=f'Responsibility Weights: {label}')
    fig.show()

Blended weight matrix¶

The heatmap shows the full weight matrix: how much each original month (columns) relies on each representative (rows). In the blended model, every month is a weighted mix of all three representatives — not assigned to just one.

In [7]:

Copied!





heatmap_df = weights_blended_df.copy()
heatmap_df.index = heatmap_df.index.astype(str)
heatmap_df.columns = heatmap_df.columns.astype(str)

fig = px.imshow(
    heatmap_df.T,
    labels=dict(x='Original Month', y='Representative', color='Weight'),
    color_continuous_scale='Blues',
    aspect='auto',
    title='Blended Weight Matrix',
)
fig.show()
heatmap_df = weights_blended_df.copy()
heatmap_df.index = heatmap_df.index.astype(str)
heatmap_df.columns = heatmap_df.columns.astype(str)

fig = px.imshow(
    heatmap_df.T,
    labels=dict(x='Original Month', y='Representative', color='Weight'),
    color_continuous_scale='Blues',
    aspect='auto',
    title='Blended Weight Matrix',
)
fig.show()

Feature space and distribution fidelity¶

In [8]:

Copied!





fig = diag.FeatureSpaceScatter2D().plot(
    feature_context.df_features, x='pc_0', y='pc_1', selection=selection
)
fig.update_layout(title='Feature Space with Selection')
fig.show()
fig = diag.FeatureSpaceScatter2D().plot(
    feature_context.df_features, x='pc_0', y='pc_1', selection=selection
)
fig.update_layout(title='Feature Space with Selection')
fig.show()

In [9]:

Copied!





selected_indices = slicer.get_indices_for_slice_combi(df_raw.index, selection)
df_selection = df_raw.loc[selected_indices]

for var in df_raw.columns:
    fig = diag.DistributionOverlayECDF().plot(df_raw[var], df_selection[var])
    fig.update_layout(title=f'ECDF Overlay: {var}')
    fig.show()
selected_indices = slicer.get_indices_for_slice_combi(df_raw.index, selection)
df_selection = df_raw.loc[selected_indices]

for var in df_raw.columns:
    fig = diag.DistributionOverlayECDF().plot(df_raw[var], df_selection[var])
    fig.update_layout(title=f'ECDF Overlay: {var}')
    fig.show()