energy-repset¶
A unified, modular framework for representative subset selection in multi-variate time-series spaces.
Why this package?¶
Energy system models, capacity expansion studies, and other time-series-heavy applications often need to reduce a full year (or longer) of hourly data to a small set of representative periods -- days, weeks, or months -- without losing what matters. The literature offers many methods (k-means, k-medoids, MILP-based selection, genetic algorithms, etc.), but the landscape is dense and tangled: each method bundles multiple decisions -- how to represent data, what to optimize, how to search -- into a single procedure, making it hard to see which choices matter, compare approaches on equal footing, or adapt a method to your specific problem.
energy-repset clears a path through the jungle in two ways:
-
A unified framework that decomposes any representative period selection method into five interchangeable components. Every established methodology is a specific instantiation of this structure. The framework provides a common language for describing, comparing, and assembling methods. The full theoretical treatment is available in the Unified Framework document.
-
A modular Python package that implements this framework as a library of composable, protocol-based modules. You pick one implementation per component, wire them together, and run. Adding a new algorithm or score metric means implementing a single protocol -- everything else stays the same.
The Five Components¶
| Component | Symbol | Role |
|---|---|---|
| Feature Space | F | How raw time-series are transformed into comparable representations |
| Objective | O | How candidate selections are scored for quality |
| Selection Space | S | What is being selected (historical subsets, synthetic archetypes, etc.) |
| Representation Model | R | How selected periods represent the full dataset |
| Search Algorithm | A | The engine that finds optimal selections |
Navigating the project¶
Website: energy-repset.mesqual.io
Documentation site (energy-repset-docs.mesqual.io):
| Section | What you'll find |
|---|---|
| Unified Framework | The theoretical paper: problem decomposition, component taxonomy, method comparison |
| Workflow Types | The three workflow patterns: generate-and-test, constructive, direct optimization |
| Modules & Components | Inventory of all implemented modules and how they map to the five components |
| Configuration Advisor | Decision guide for choosing components based on your problem |
| Getting Started | End-to-end walkthrough from data to result |
| Examples | Worked examples showcasing different configurations |
| API Reference | Auto-generated class and method documentation |
Package structure (energy_repset/):
| Module | Framework component |
|---|---|
context, time_slicer |
Problem definition and data container |
feature_engineering/ |
F -- Feature engineers (statistical summaries, PCA, pipelines) |
objectives, score_components/ |
O -- Objective sets and scoring metrics |
combi_gens/ |
S -- Combination generators (exhaustive, group-quota, hierarchical) |
representation/ |
R -- Representation models (uniform, cluster-based, blended) |
search_algorithms/, selection_policies/ |
A -- Search algorithms and selection policies |
workflow, problem, results |
Orchestration: wire components, run, collect results |
diagnostics/ |
Visualization and analysis of features, scores, and results |
Installation¶
Option 1 -- Install directly from GitHub:
pip install git+https://github.com/mesqual/energy-repset.git
Option 2 -- Clone and install in editable mode:
git clone https://github.com/mesqual/energy-repset.git
cd energy-repset
pip install -e .
Option 3 -- Add as a Git submodule (useful for monorepos):
git submodule add https://github.com/mesqual/energy-repset.git
pip install -e energy-repset
Alternatively, skip the install and mark the energy-repset directory as a
source root in your IDE so that import energy_repset resolves directly.
Quick Start¶
import pandas as pd
import energy_repset as rep
# Load hourly time-series data (columns = variables, index = datetime)
df_raw = pd.read_csv("your_data.csv", index_col=0, parse_dates=True)
# Define problem: slice the year into monthly candidate periods
slicer = rep.TimeSlicer(unit="month")
context = rep.ProblemContext(df_raw, slicer)
# Feature engineering: statistical summaries per month
feature_engineer = rep.StandardStatsFeatureEngineer()
# Objective: score each candidate selection on distribution fidelity
objective_set = rep.ObjectiveSet({
'wasserstein': (1.0, rep.WassersteinFidelity()),
'correlation': (1.0, rep.CorrelationFidelity()),
})
# Search: evaluate all 4-of-12 monthly combinations
policy = rep.WeightedSumPolicy()
combi_gen = rep.ExhaustiveCombiGen(k=4)
search = rep.ObjectiveDrivenCombinatorialSearchAlgorithm(objective_set, policy, combi_gen)
# Representation: equal 1/k weights per selected month
representation = rep.UniformRepresentationModel()
# Assemble and run
workflow = rep.Workflow(feature_engineer, search, representation)
experiment = rep.RepSetExperiment(context, workflow)
result = experiment.run()
print(result.selection) # e.g., (Period('2019-01', 'M'), Period('2019-04', 'M'), ...)
print(result.weights) # e.g., {Period('2019-01', 'M'): 3.0, ...}
print(result.scores) # e.g., {'wasserstein': 0.023, 'correlation': 0.015}
License¶
Apache-2.0