Example 1: Getting Started¶

A minimal end-to-end workflow that selects 4 representative months from a year of hourly time-series data.

This example walks through the five pillars of the energy-repset framework:

Pillar	Component	Choice in this example
F — Feature Space	How periods are compared	Statistical summaries (mean, std, min, max, quantiles, ramps)
O — Objective	What "representative" means	Wasserstein distance (marginal distribution fidelity)
S — Selection Space	What we pick from	All 4-of-12 monthly combinations (495 candidates)
R — Representation	How selected periods stand in for the year	Uniform weights (each month = 1/4 of the year)
A — Search Algorithm	How we find the best selection	Exhaustive generate-and-test with weighted-sum policy

In [11]:

Copied!





import pandas as pd
import energy_repset as rep
import energy_repset.diagnostics as diag
import plotly.io as pio; pio.renderers.default = 'notebook_connected'
import pandas as pd
import energy_repset as rep
import energy_repset.diagnostics as diag
import plotly.io as pio; pio.renderers.default = 'notebook_connected'

Load data¶

One year of hourly time series with four variables: electricity demand (load), onshore wind (onwind), offshore wind (offwind), and solar capacity factors (solar).

In [12]:

Copied!





url = "https://tubcloud.tu-berlin.de/s/pKttFadrbTKSJKF/download/time-series-lecture-2.csv"
df_raw = pd.read_csv(url, index_col=0, parse_dates=True).rename_axis('variable', axis=1)
df_raw = df_raw.drop('prices', axis=1)
df_raw
url = "https://tubcloud.tu-berlin.de/s/pKttFadrbTKSJKF/download/time-series-lecture-2.csv"
df_raw = pd.read_csv(url, index_col=0, parse_dates=True).rename_axis('variable', axis=1)
df_raw = df_raw.drop('prices', axis=1)
df_raw

Out[12]:

variable	load	onwind	offwind	solar
2015-01-01 00:00:00	41.151	0.1566	0.7030	0.0
2015-01-01 01:00:00	40.135	0.1659	0.6875	0.0
2015-01-01 02:00:00	39.106	0.1746	0.6535	0.0
2015-01-01 03:00:00	38.765	0.1745	0.6803	0.0
2015-01-01 04:00:00	38.941	0.1826	0.7272	0.0
...	...	...	...	...
2015-12-31 19:00:00	47.719	0.1388	0.4434	0.0
2015-12-31 20:00:00	45.911	0.1211	0.4023	0.0
2015-12-31 21:00:00	45.611	0.1082	0.4171	0.0
2015-12-31 22:00:00	43.762	0.1026	0.4716	0.0
2015-12-31 23:00:00	41.905	0.0975	0.5239	0.0

8760 rows × 4 columns

Define the problem context¶

The TimeSlicer divides the year into candidate periods — here, 12 calendar months. The ProblemContext bundles the raw data and slicing logic into a single object that flows through the entire pipeline.

In [13]:

Copied!

slicer = rep.TimeSlicer(unit="month")
context = rep.ProblemContext(df_raw=df_raw, slicer=slicer)
print(f"Candidate slices: {context.get_unique_slices()}")
slicer = rep.TimeSlicer(unit="month")
context = rep.ProblemContext(df_raw=df_raw, slicer=slicer)
print(f"Candidate slices: {context.get_unique_slices()}")

Candidate slices: [Period('2015-01', 'M'), Period('2015-02', 'M'), Period('2015-03', 'M'), Period('2015-04', 'M'), Period('2015-05', 'M'), Period('2015-06', 'M'), Period('2015-07', 'M'), Period('2015-08', 'M'), Period('2015-09', 'M'), Period('2015-10', 'M'), Period('2015-11', 'M'), Period('2015-12', 'M')]

Pillar F: Feature engineering¶

Before we can compare months, we need a numerical representation. StandardStatsFeatureEngineer computes a set of statistical summaries (mean, std, min, max, quantiles, ramp rates) per variable per month. This transforms each month into a fixed-length feature vector.

In [14]:

Copied!

feature_engineer = rep.StandardStatsFeatureEngineer()
feature_engineer = rep.StandardStatsFeatureEngineer()

Pillar O: Objective¶

We use a single score component: Wasserstein fidelity. It measures how well the marginal distribution of the selected months matches the full year. Lower distance = better match.

With only one objective, the selection policy is straightforward — just pick the combination with the best score.

In [15]:

Copied!

objective_set = rep.ObjectiveSet({
    'wasserstein': (1.0, rep.WassersteinFidelity()),
})
objective_set = rep.ObjectiveSet({
    'wasserstein': (1.0, rep.WassersteinFidelity()),
})

Pillars S + A: Selection space and search¶

ExhaustiveCombiGen enumerates all $\binom{12}{4} = 495$ ways to pick 4 months from 12. Each candidate is scored by the objective, and WeightedSumPolicy (trivial here with one component) picks the winner.

In [16]:

Copied!





k = 4
combi_gen = rep.ExhaustiveCombiGen(k=k)
policy = rep.WeightedSumPolicy()
search_algorithm = rep.ObjectiveDrivenCombinatorialSearchAlgorithm(
    objective_set, policy, combi_gen
)
k = 4
combi_gen = rep.ExhaustiveCombiGen(k=k)
policy = rep.WeightedSumPolicy()
search_algorithm = rep.ObjectiveDrivenCombinatorialSearchAlgorithm(
    objective_set, policy, combi_gen
)

Pillar R: Representation model¶

With uniform weights, each selected month represents exactly 1/4 of the year. This is the simplest model — no cluster assignment, no optimization of weights. It places the full burden on the selection itself being intrinsically representative.

In [17]:

Copied!

representation_model = rep.UniformRepresentationModel()
representation_model = rep.UniformRepresentationModel()

Run the workflow¶

In [18]:

Copied!

workflow = rep.Workflow(feature_engineer, search_algorithm, representation_model)
experiment = rep.RepSetExperiment(context, workflow)
result = experiment.run()
workflow = rep.Workflow(feature_engineer, search_algorithm, representation_model)
experiment = rep.RepSetExperiment(context, workflow)
result = experiment.run()

Iterating over combinations: 100%|██████████| 495/495 [00:02<00:00, 215.34it/s]

Inspect results¶

In [19]:

Copied!

print(f"Selected months: {result.selection}")
print(f"Weights: {result.weights}")
print(f"Wasserstein score: {result.scores['wasserstein']:.4f}")
print(f"Selected months: {result.selection}")
print(f"Weights: {result.weights}")
print(f"Wasserstein score: {result.scores['wasserstein']:.4f}")

Selected months: (Period('2015-01', 'M'), Period('2015-02', 'M'), Period('2015-05', 'M'), Period('2015-06', 'M'))
Weights: {Period('2015-01', 'M'): 0.25, Period('2015-02', 'M'): 0.25, Period('2015-05', 'M'): 0.25, Period('2015-06', 'M'): 0.25}
Wasserstein score: 0.0684

Diagnostic: responsibility weights¶

The bar chart below shows the weight assigned to each selected month. With uniform representation, all bars are equal at 0.25. The dashed line indicates the "ideal" uniform reference.

In [20]:

Copied!

fig = diag.ResponsibilityBars().plot(result.weights, show_uniform_reference=True)
fig.update_layout(title='Responsibility Weights (Uniform)')
fig.show()
fig = diag.ResponsibilityBars().plot(result.weights, show_uniform_reference=True)
fig.update_layout(title='Responsibility Weights (Uniform)')
fig.show()