Getting Started¶
This guide walks through a minimal end-to-end workflow, explaining each step and how it maps to the five-pillar framework (F, O, S, R, A).
By the end, you will have selected 4 representative months from a year of hourly time-series data, scored them on distribution fidelity, and generated a simple diagnostic chart.
Installation¶
Option 1 -- Install directly from GitHub:
pip install git+https://github.com/mesqual/energy-repset.git
Option 2 -- Clone and install in editable mode:
git clone https://github.com/mesqual/energy-repset.git
cd energy-repset
pip install -e .
Option 3 -- Add as a Git submodule (useful for monorepos):
git submodule add https://github.com/mesqual/energy-repset.git
pip install -e energy-repset
Alternatively, skip the install and mark the energy-repset directory as a
source root in your IDE so that import energy_repset resolves directly.
Imports¶
All framework classes are available from the top-level namespace. Diagnostics live one level down:
import pandas as pd
import energy_repset as rep
import energy_repset.diagnostics as diag
Load Data¶
energy-repset works with any pandas.DataFrame where the index is a
DatetimeIndex and each column is a variable (e.g., load, wind, solar):
url = "https://tubcloud.tu-berlin.de/s/pKttFadrbTKSJKF/download/time-series-lecture-2.csv"
df_raw = pd.read_csv(url, index_col=0, parse_dates=True).rename_axis('variable', axis=1)
df_raw = df_raw.drop('prices', axis=1)
Define the Problem Context¶
The ProblemContext combines the raw data with a TimeSlicer that defines
how the time axis is divided into candidate periods. Here, each calendar
month becomes one candidate:
slicer = rep.TimeSlicer(unit="month")
context = rep.ProblemContext(df_raw=df_raw, slicer=slicer)
print(f"Candidate slices: {context.get_unique_slices()}")
# -> 12 monthly periods
Pillar F: Feature Engineering¶
Feature engineering transforms the raw time-series into a compact
representation that can be compared across candidate periods.
StandardStatsFeatureEngineer computes statistical summaries (mean, std,
quantiles, ramp rates) per slice and variable:
feature_engineer = rep.StandardStatsFeatureEngineer()
For richer feature spaces, you can chain engineers with a FeaturePipeline:
feature_pipeline = rep.FeaturePipeline(engineers={
'stats': rep.StandardStatsFeatureEngineer(),
'pca': rep.PCAFeatureEngineer(),
})
In this guide we keep it simple and use only the statistical features.
Pillar O: Objective¶
The ObjectiveSet defines how candidate selections are scored. Each entry
maps a name to a (weight, ScoreComponent) tuple. Here we use a single
objective: Wasserstein distance between the marginal distributions of the
selection and the full year.
objective_set = rep.ObjectiveSet({
'wasserstein': (1.0, rep.WassersteinFidelity()),
})
Multiple objectives are easy to add:
objective_set = rep.ObjectiveSet({
'wasserstein': (1.0, rep.WassersteinFidelity()),
'correlation': (1.0, rep.CorrelationFidelity()),
})
Pillar S: Selection Space¶
A CombinationGenerator defines which subsets are considered.
ExhaustiveCombiGen evaluates every possible k-of-n combination:
k = 4
combi_gen = rep.ExhaustiveCombiGen(k=k)
# For 12 months, k=4 -> C(12,4) = 495 candidates
Pillar A: Search Algorithm¶
The search algorithm orchestrates the evaluation loop. In the generate-and-test
workflow, it generates candidates via the CombinationGenerator, scores each
with the ObjectiveSet, and picks a winner using the SelectionPolicy:
policy = rep.WeightedSumPolicy()
search_algorithm = rep.ObjectiveDrivenCombinatorialSearchAlgorithm(
objective_set, policy, combi_gen
)
Pillar R: Representation Model¶
The representation model determines how the selected periods represent the
full year. UniformRepresentationModel assigns equal weight to each
selected period:
representation_model = rep.UniformRepresentationModel()
Run the Workflow¶
Assemble all components into a Workflow, wrap it in a RepSetExperiment,
and run:
workflow = rep.Workflow(feature_engineer, search_algorithm, representation_model)
experiment = rep.RepSetExperiment(context, workflow)
result = experiment.run()
Inspect Results¶
The RepSetResult contains the selected periods, their weights, and the
objective scores:
print(f"Selected months: {result.selection}")
print(f"Weights: {result.weights}")
print(f"Wasserstein score: {result.scores['wasserstein']:.4f}")
Diagnostic: Responsibility Bars¶
energy-repset includes interactive Plotly diagnostics. A responsibility bar chart shows how the total representation weight is distributed across selected periods:
fig = diag.ResponsibilityBars().plot(result.weights, show_uniform_reference=True)
fig.show()
Full Script¶
The complete code is available at
examples/ex1_getting_started.py.
Next Steps¶
- Swap components: Try
rep.ParetoMaxMinStrategyinstead ofrep.WeightedSumPolicy, orrep.KMedoidsClustersizeRepresentationinstead of uniform weights. See the Modules & Components page for all available implementations. - Add objectives: Add
rep.CorrelationFidelity,rep.DurationCurveFidelity, orrep.DiversityRewardto theObjectiveSet. - Browse examples: The Examples show more advanced configurations with interactive visualizations.