Example 7: Constructive Algorithms¶
Examples 1--5 all use the Generate-and-Test workflow: enumerate candidates, score each one, pick the best. Constructive algorithms work differently — they build solutions iteratively using their own internal objectives.
This notebook demonstrates three constructive algorithms:
| Algorithm | Idea | Internal objective | Output |
|---|---|---|---|
| Hull Clustering | Select periods that span the convex hull of the feature space | Minimize projection error | Subset + blended weights |
| CTPC | Hierarchical clustering that only merges temporally adjacent periods | Minimize within-cluster sum of squares | Contiguous segments + segment-size weights |
| Snippet | Greedy selection of multi-day subsequences via p-median | Minimize total distance to nearest snippet | Subsequence starts + assignment-fraction weights |
Because these algorithms have their own built-in objectives, they do not use the ObjectiveSet during search — only for optional post-hoc evaluation.
import pandas as pd
import energy_repset as rep
import energy_repset.diagnostics as diag
import plotly.io as pio; pio.renderers.default = 'notebook_connected'
url = "https://tubcloud.tu-berlin.de/s/pKttFadrbTKSJKF/download/time-series-lecture-2.csv"
df_raw = pd.read_csv(url, index_col=0, parse_dates=True).rename_axis('variable', axis=1)
df_raw = df_raw.drop('prices', axis=1)
1. Hull Clustering¶
Hull Clustering selects periods that form the vertices of a convex hull enclosing the data in feature space. The intuition: if your representatives span the "boundary" of the data cloud, every other period can be expressed as a convex combination of them — which is exactly what the blended representation model does.
The algorithm greedily adds the period that most reduces the total projection error (i.e., the error from approximating all periods as convex combinations of the selected ones).
context_monthly = rep.ProblemContext(df_raw=df_raw, slicer=rep.TimeSlicer(unit="month"))
workflow_hull = rep.Workflow(
feature_engineer=rep.StandardStatsFeatureEngineer(),
search_algorithm=rep.HullClusteringSearch(k=3, hull_type='convex'),
representation_model=rep.BlendedRepresentationModel(blend_type='convex'),
)
experiment_hull = rep.RepSetExperiment(context_monthly, workflow_hull)
result_hull = experiment_hull.run()
print(f"Selection: {result_hull.selection}")
print(f"Projection error: {result_hull.scores['projection_error']:.4f}")
Selection: (Period('2015-12', 'M'), Period('2015-08', 'M'), Period('2015-11', 'M'))
Projection error: 170.9117
# Aggregate blended weights for bar chart
if isinstance(result_hull.weights, pd.DataFrame):
agg = result_hull.weights.sum(axis=0)
weights_hull_agg = (agg / agg.sum()).to_dict()
else:
weights_hull_agg = result_hull.weights
fig = diag.ResponsibilityBars().plot(weights_hull_agg, show_uniform_reference=True)
fig.update_layout(title='Hull Clustering: Responsibility Weights (Blended, Aggregated)')
fig.show()
feature_ctx_hull = experiment_hull.feature_context
cols = list(feature_ctx_hull.df_features.columns[:2])
fig = diag.FeatureSpaceScatter2D().plot(
feature_ctx_hull.df_features, x=cols[0], y=cols[1], selection=result_hull.selection
)
fig.update_layout(title='Hull Clustering: Feature Space')
fig.show()
slicer_monthly = rep.TimeSlicer(unit="month")
selected_idx = slicer_monthly.get_indices_for_slice_combi(df_raw.index, result_hull.selection)
df_sel = df_raw.loc[selected_idx]
fig = diag.DistributionOverlayECDF().plot(df_raw['load'], df_sel['load'])
fig.update_layout(title='Hull Clustering: ECDF Overlay (Load)')
fig.show()
2. CTPC (Chronological Time-Period Clustering)¶
CTPC is hierarchical agglomerative clustering with a contiguity constraint: only temporally adjacent periods can be merged. This guarantees the output is a set of contiguous segments — e.g., "Jan-Mar", "Apr-Jun", etc. — which is useful for models with long-duration storage or seasonal dynamics.
Weights reflect the size of each segment relative to the total.
workflow_ctpc = rep.Workflow(
feature_engineer=rep.StandardStatsFeatureEngineer(),
search_algorithm=rep.CTPCSearch(k=4, linkage='ward'),
)
experiment_ctpc = rep.RepSetExperiment(context_monthly, workflow_ctpc)
result_ctpc = experiment_ctpc.run()
print(f"Selection: {result_ctpc.selection}")
print(f"WCSS: {result_ctpc.scores['wcss']:.4f}")
print(f"Weights: { {str(k): round(v, 3) for k, v in result_ctpc.weights.items()} }")
if 'segments' in result_ctpc.diagnostics:
print("Contiguous segments:")
for seg in result_ctpc.diagnostics['segments']:
print(f" {seg['start']} -- {seg['end']} "
f"(size={seg['size']}, representative={seg['representative']})")
Contiguous segments: 2015-01 -- 2015-03 (size=3, representative=2015-03) 2015-04 -- 2015-08 (size=5, representative=2015-06) 2015-09 -- 2015-10 (size=2, representative=2015-09) 2015-11 -- 2015-12 (size=2, representative=2015-11)
fig = diag.ResponsibilityBars().plot(result_ctpc.weights, show_uniform_reference=True)
fig.update_layout(title='CTPC: Responsibility Weights (Segment Fractions)')
fig.show()
feature_ctx_ctpc = experiment_ctpc.feature_context
cols = list(feature_ctx_ctpc.df_features.columns[:2])
fig = diag.FeatureSpaceScatter2D().plot(
feature_ctx_ctpc.df_features, x=cols[0], y=cols[1], selection=result_ctpc.selection
)
fig.update_layout(title='CTPC: Feature Space')
fig.show()
selected_idx_ctpc = slicer_monthly.get_indices_for_slice_combi(df_raw.index, result_ctpc.selection)
df_sel_ctpc = df_raw.loc[selected_idx_ctpc]
fig = diag.DistributionOverlayECDF().plot(df_raw['load'], df_sel_ctpc['load'])
fig.update_layout(title='CTPC: ECDF Overlay (Load)')
fig.show()
3. Snippet Algorithm¶
The Snippet algorithm is designed for multi-day representative periods (e.g., weeks). Instead of comparing entire weeks as monolithic objects, it compares day-level "snippets" within candidate periods. This avoids the problem where a single anomalous day makes an entire week look unique.
It uses a greedy p-median approach: iteratively select the snippet whose addition most reduces the total distance from all days to their nearest selected snippet.
context_daily = rep.ProblemContext(df_raw=df_raw, slicer=rep.TimeSlicer(unit="day"))
workflow_snippet = rep.Workflow(
feature_engineer=rep.DirectProfileFeatureEngineer(),
search_algorithm=rep.SnippetSearch(k=4, period_length_days=7, step_days=7),
)
experiment_snippet = rep.RepSetExperiment(context_daily, workflow_snippet)
result_snippet = experiment_snippet.run()
print(f"Selection (start days): {result_snippet.selection}")
print(f"Total distance: {result_snippet.scores['total_distance']:.4f}")
print(f"Weights: { {str(k): round(v, 3) for k, v in result_snippet.weights.items()} }")
fig = diag.ResponsibilityBars().plot(result_snippet.weights, show_uniform_reference=True)
fig.update_layout(title='Snippet: Responsibility Weights (Assignment Fractions)')
fig.show()
slicer_daily = rep.TimeSlicer(unit="day")
selected_idx_snippet = slicer_daily.get_indices_for_slice_combi(
df_raw.index, result_snippet.selection
)
df_sel_snippet = df_raw.loc[selected_idx_snippet]
fig = diag.DistributionOverlayECDF().plot(df_raw['load'], df_sel_snippet['load'])
fig.update_layout(title='Snippet: ECDF Overlay (Load)')
fig.show()
Summary¶
All three constructive algorithms find solutions without evaluating an external objective set. Their internal objectives are tightly coupled to the algorithm mechanics, which makes them fast but less modular than Generate-and-Test.
print(f"{'Algorithm':<25} {'k':>3} {'Internal Score':>20} {'Value':>10}")
print("-" * 60)
print(f"{'Hull Clustering':<25} {'3':>3} {'projection_error':>20} "
f"{result_hull.scores['projection_error']:>10.4f}")
print(f"{'CTPC':<25} {'4':>3} {'wcss':>20} "
f"{result_ctpc.scores['wcss']:>10.4f}")
print(f"{'Snippet':<25} {'4':>3} {'total_distance':>20} "
f"{result_snippet.scores['total_distance']:>10.4f}")
Algorithm k Internal Score Value ------------------------------------------------------------ Hull Clustering 3 projection_error 170.9117 CTPC 4 wcss 121.7921 Snippet 4 total_distance 21833.6298