Example 7: Constructive Algorithms¶

Examples 1--5 all use the Generate-and-Test workflow: enumerate candidates, score each one, pick the best. Constructive algorithms work differently — they build solutions iteratively using their own internal objectives.

This notebook demonstrates three constructive algorithms:

Algorithm	Idea	Internal objective	Output
Hull Clustering	Select periods that span the convex hull of the feature space	Minimize projection error	Subset + blended weights
CTPC	Hierarchical clustering that only merges temporally adjacent periods	Minimize within-cluster sum of squares	Contiguous segments + segment-size weights
Snippet	Greedy selection of multi-day subsequences via p-median	Minimize total distance to nearest snippet	Subsequence starts + assignment-fraction weights

Because these algorithms have their own built-in objectives, they do not use the ObjectiveSet during search — only for optional post-hoc evaluation.

In [1]:

Copied!





import pandas as pd
import energy_repset as rep
import energy_repset.diagnostics as diag
import plotly.io as pio; pio.renderers.default = 'notebook_connected'
import pandas as pd
import energy_repset as rep
import energy_repset.diagnostics as diag
import plotly.io as pio; pio.renderers.default = 'notebook_connected'

In [2]:

Copied!

url = "https://tubcloud.tu-berlin.de/s/pKttFadrbTKSJKF/download/time-series-lecture-2.csv"
df_raw = pd.read_csv(url, index_col=0, parse_dates=True).rename_axis('variable', axis=1)
df_raw = df_raw.drop('prices', axis=1)
url = "https://tubcloud.tu-berlin.de/s/pKttFadrbTKSJKF/download/time-series-lecture-2.csv"
df_raw = pd.read_csv(url, index_col=0, parse_dates=True).rename_axis('variable', axis=1)
df_raw = df_raw.drop('prices', axis=1)

1. Hull Clustering¶

Hull Clustering selects periods that form the vertices of a convex hull enclosing the data in feature space. The intuition: if your representatives span the "boundary" of the data cloud, every other period can be expressed as a convex combination of them — which is exactly what the blended representation model does.

The algorithm greedily adds the period that most reduces the total projection error (i.e., the error from approximating all periods as convex combinations of the selected ones).

In [3]:

Copied!





context_monthly = rep.ProblemContext(df_raw=df_raw, slicer=rep.TimeSlicer(unit="month"))

workflow_hull = rep.Workflow(
    feature_engineer=rep.StandardStatsFeatureEngineer(),
    search_algorithm=rep.HullClusteringSearch(k=3, hull_type='convex'),
    representation_model=rep.BlendedRepresentationModel(blend_type='convex'),
)
experiment_hull = rep.RepSetExperiment(context_monthly, workflow_hull)
result_hull = experiment_hull.run()

print(f"Selection:        {result_hull.selection}")
print(f"Projection error: {result_hull.scores['projection_error']:.4f}")
context_monthly = rep.ProblemContext(df_raw=df_raw, slicer=rep.TimeSlicer(unit="month"))

workflow_hull = rep.Workflow(
    feature_engineer=rep.StandardStatsFeatureEngineer(),
    search_algorithm=rep.HullClusteringSearch(k=3, hull_type='convex'),
    representation_model=rep.BlendedRepresentationModel(blend_type='convex'),
)
experiment_hull = rep.RepSetExperiment(context_monthly, workflow_hull)
result_hull = experiment_hull.run()

print(f"Selection:        {result_hull.selection}")
print(f"Projection error: {result_hull.scores['projection_error']:.4f}")

Selection:        (Period('2015-12', 'M'), Period('2015-08', 'M'), Period('2015-11', 'M'))
Projection error: 170.9117

In [4]:

Copied!





# Aggregate blended weights for bar chart
if isinstance(result_hull.weights, pd.DataFrame):
    agg = result_hull.weights.sum(axis=0)
    weights_hull_agg = (agg / agg.sum()).to_dict()
else:
    weights_hull_agg = result_hull.weights

fig = diag.ResponsibilityBars().plot(weights_hull_agg, show_uniform_reference=True)
fig.update_layout(title='Hull Clustering: Responsibility Weights (Blended, Aggregated)')
fig.show()
# Aggregate blended weights for bar chart
if isinstance(result_hull.weights, pd.DataFrame):
    agg = result_hull.weights.sum(axis=0)
    weights_hull_agg = (agg / agg.sum()).to_dict()
else:
    weights_hull_agg = result_hull.weights

fig = diag.ResponsibilityBars().plot(weights_hull_agg, show_uniform_reference=True)
fig.update_layout(title='Hull Clustering: Responsibility Weights (Blended, Aggregated)')
fig.show()

In [5]:

Copied!





feature_ctx_hull = experiment_hull.feature_context
cols = list(feature_ctx_hull.df_features.columns[:2])

fig = diag.FeatureSpaceScatter2D().plot(
    feature_ctx_hull.df_features, x=cols[0], y=cols[1], selection=result_hull.selection
)
fig.update_layout(title='Hull Clustering: Feature Space')
fig.show()
feature_ctx_hull = experiment_hull.feature_context
cols = list(feature_ctx_hull.df_features.columns[:2])

fig = diag.FeatureSpaceScatter2D().plot(
    feature_ctx_hull.df_features, x=cols[0], y=cols[1], selection=result_hull.selection
)
fig.update_layout(title='Hull Clustering: Feature Space')
fig.show()

In [6]:

Copied!





slicer_monthly = rep.TimeSlicer(unit="month")
selected_idx = slicer_monthly.get_indices_for_slice_combi(df_raw.index, result_hull.selection)
df_sel = df_raw.loc[selected_idx]

fig = diag.DistributionOverlayECDF().plot(df_raw['load'], df_sel['load'])
fig.update_layout(title='Hull Clustering: ECDF Overlay (Load)')
fig.show()
slicer_monthly = rep.TimeSlicer(unit="month")
selected_idx = slicer_monthly.get_indices_for_slice_combi(df_raw.index, result_hull.selection)
df_sel = df_raw.loc[selected_idx]

fig = diag.DistributionOverlayECDF().plot(df_raw['load'], df_sel['load'])
fig.update_layout(title='Hull Clustering: ECDF Overlay (Load)')
fig.show()

2. CTPC (Chronological Time-Period Clustering)¶

CTPC is hierarchical agglomerative clustering with a contiguity constraint: only temporally adjacent periods can be merged. This guarantees the output is a set of contiguous segments — e.g., "Jan-Mar", "Apr-Jun", etc. — which is useful for models with long-duration storage or seasonal dynamics.

Weights reflect the size of each segment relative to the total.

In [ ]:

Copied!





workflow_ctpc = rep.Workflow(
    feature_engineer=rep.StandardStatsFeatureEngineer(),
    search_algorithm=rep.CTPCSearch(k=4, linkage='ward'),
)
experiment_ctpc = rep.RepSetExperiment(context_monthly, workflow_ctpc)
result_ctpc = experiment_ctpc.run()

print(f"Selection: {result_ctpc.selection}")
print(f"WCSS:      {result_ctpc.scores['wcss']:.4f}")
print(f"Weights:   { {str(k): round(v, 3) for k, v in result_ctpc.weights.items()} }")
workflow_ctpc = rep.Workflow(
    feature_engineer=rep.StandardStatsFeatureEngineer(),
    search_algorithm=rep.CTPCSearch(k=4, linkage='ward'),
)
experiment_ctpc = rep.RepSetExperiment(context_monthly, workflow_ctpc)
result_ctpc = experiment_ctpc.run()

print(f"Selection: {result_ctpc.selection}")
print(f"WCSS:      {result_ctpc.scores['wcss']:.4f}")
print(f"Weights:   { {str(k): round(v, 3) for k, v in result_ctpc.weights.items()} }")

In [8]:

Copied!





if 'segments' in result_ctpc.diagnostics:
    print("Contiguous segments:")
    for seg in result_ctpc.diagnostics['segments']:
        print(f"  {seg['start']} -- {seg['end']}  "
              f"(size={seg['size']}, representative={seg['representative']})")
if 'segments' in result_ctpc.diagnostics:
    print("Contiguous segments:")
    for seg in result_ctpc.diagnostics['segments']:
        print(f"  {seg['start']} -- {seg['end']}  "
              f"(size={seg['size']}, representative={seg['representative']})")

Contiguous segments:
  2015-01 -- 2015-03  (size=3, representative=2015-03)
  2015-04 -- 2015-08  (size=5, representative=2015-06)
  2015-09 -- 2015-10  (size=2, representative=2015-09)
  2015-11 -- 2015-12  (size=2, representative=2015-11)

In [9]:

Copied!

fig = diag.ResponsibilityBars().plot(result_ctpc.weights, show_uniform_reference=True)
fig.update_layout(title='CTPC: Responsibility Weights (Segment Fractions)')
fig.show()
fig = diag.ResponsibilityBars().plot(result_ctpc.weights, show_uniform_reference=True)
fig.update_layout(title='CTPC: Responsibility Weights (Segment Fractions)')
fig.show()

In [10]:

Copied!





feature_ctx_ctpc = experiment_ctpc.feature_context
cols = list(feature_ctx_ctpc.df_features.columns[:2])

fig = diag.FeatureSpaceScatter2D().plot(
    feature_ctx_ctpc.df_features, x=cols[0], y=cols[1], selection=result_ctpc.selection
)
fig.update_layout(title='CTPC: Feature Space')
fig.show()
feature_ctx_ctpc = experiment_ctpc.feature_context
cols = list(feature_ctx_ctpc.df_features.columns[:2])

fig = diag.FeatureSpaceScatter2D().plot(
    feature_ctx_ctpc.df_features, x=cols[0], y=cols[1], selection=result_ctpc.selection
)
fig.update_layout(title='CTPC: Feature Space')
fig.show()

In [11]:

Copied!





selected_idx_ctpc = slicer_monthly.get_indices_for_slice_combi(df_raw.index, result_ctpc.selection)
df_sel_ctpc = df_raw.loc[selected_idx_ctpc]

fig = diag.DistributionOverlayECDF().plot(df_raw['load'], df_sel_ctpc['load'])
fig.update_layout(title='CTPC: ECDF Overlay (Load)')
fig.show()
selected_idx_ctpc = slicer_monthly.get_indices_for_slice_combi(df_raw.index, result_ctpc.selection)
df_sel_ctpc = df_raw.loc[selected_idx_ctpc]

fig = diag.DistributionOverlayECDF().plot(df_raw['load'], df_sel_ctpc['load'])
fig.update_layout(title='CTPC: ECDF Overlay (Load)')
fig.show()

3. Snippet Algorithm¶

The Snippet algorithm is designed for multi-day representative periods (e.g., weeks). Instead of comparing entire weeks as monolithic objects, it compares day-level "snippets" within candidate periods. This avoids the problem where a single anomalous day makes an entire week look unique.

It uses a greedy p-median approach: iteratively select the snippet whose addition most reduces the total distance from all days to their nearest selected snippet.

In [ ]:

Copied!





context_daily = rep.ProblemContext(df_raw=df_raw, slicer=rep.TimeSlicer(unit="day"))

workflow_snippet = rep.Workflow(
    feature_engineer=rep.DirectProfileFeatureEngineer(),
    search_algorithm=rep.SnippetSearch(k=4, period_length_days=7, step_days=7),
)
experiment_snippet = rep.RepSetExperiment(context_daily, workflow_snippet)
result_snippet = experiment_snippet.run()

print(f"Selection (start days): {result_snippet.selection}")
print(f"Total distance:         {result_snippet.scores['total_distance']:.4f}")
print(f"Weights:                { {str(k): round(v, 3) for k, v in result_snippet.weights.items()} }")
context_daily = rep.ProblemContext(df_raw=df_raw, slicer=rep.TimeSlicer(unit="day"))

workflow_snippet = rep.Workflow(
    feature_engineer=rep.DirectProfileFeatureEngineer(),
    search_algorithm=rep.SnippetSearch(k=4, period_length_days=7, step_days=7),
)
experiment_snippet = rep.RepSetExperiment(context_daily, workflow_snippet)
result_snippet = experiment_snippet.run()

print(f"Selection (start days): {result_snippet.selection}")
print(f"Total distance:         {result_snippet.scores['total_distance']:.4f}")
print(f"Weights:                { {str(k): round(v, 3) for k, v in result_snippet.weights.items()} }")

In [13]:

Copied!

fig = diag.ResponsibilityBars().plot(result_snippet.weights, show_uniform_reference=True)
fig.update_layout(title='Snippet: Responsibility Weights (Assignment Fractions)')
fig.show()
fig = diag.ResponsibilityBars().plot(result_snippet.weights, show_uniform_reference=True)
fig.update_layout(title='Snippet: Responsibility Weights (Assignment Fractions)')
fig.show()

In [14]:

Copied!





slicer_daily = rep.TimeSlicer(unit="day")
selected_idx_snippet = slicer_daily.get_indices_for_slice_combi(
    df_raw.index, result_snippet.selection
)
df_sel_snippet = df_raw.loc[selected_idx_snippet]

fig = diag.DistributionOverlayECDF().plot(df_raw['load'], df_sel_snippet['load'])
fig.update_layout(title='Snippet: ECDF Overlay (Load)')
fig.show()
slicer_daily = rep.TimeSlicer(unit="day")
selected_idx_snippet = slicer_daily.get_indices_for_slice_combi(
    df_raw.index, result_snippet.selection
)
df_sel_snippet = df_raw.loc[selected_idx_snippet]

fig = diag.DistributionOverlayECDF().plot(df_raw['load'], df_sel_snippet['load'])
fig.update_layout(title='Snippet: ECDF Overlay (Load)')
fig.show()

Summary¶

All three constructive algorithms find solutions without evaluating an external objective set. Their internal objectives are tightly coupled to the algorithm mechanics, which makes them fast but less modular than Generate-and-Test.

In [15]:

Copied!





print(f"{'Algorithm':<25} {'k':>3} {'Internal Score':>20} {'Value':>10}")
print("-" * 60)
print(f"{'Hull Clustering':<25} {'3':>3} {'projection_error':>20} "
      f"{result_hull.scores['projection_error']:>10.4f}")
print(f"{'CTPC':<25} {'4':>3} {'wcss':>20} "
      f"{result_ctpc.scores['wcss']:>10.4f}")
print(f"{'Snippet':<25} {'4':>3} {'total_distance':>20} "
      f"{result_snippet.scores['total_distance']:>10.4f}")
print(f"{'Algorithm':<25} {'k':>3} {'Internal Score':>20} {'Value':>10}")
print("-" * 60)
print(f"{'Hull Clustering':<25} {'3':>3} {'projection_error':>20} "
      f"{result_hull.scores['projection_error']:>10.4f}")
print(f"{'CTPC':<25} {'4':>3} {'wcss':>20} "
      f"{result_ctpc.scores['wcss']:>10.4f}")
print(f"{'Snippet':<25} {'4':>3} {'total_distance':>20} "
      f"{result_snippet.scores['total_distance']:>10.4f}")

Algorithm                   k       Internal Score      Value
------------------------------------------------------------
Hull Clustering             3     projection_error   170.9117
CTPC                        4                 wcss   121.7921
Snippet                     4       total_distance 21833.6298