Skip to content

Modules & Components

energy-repset decomposes any representative period selection method into five interchangeable pillars. Each pillar has a protocol (interface) and one or more concrete implementations. Swapping a single component changes the behavior without affecting the rest of the pipeline.

The Five Pillars

Raw DataFrame
  -> TimeSlicer (defines candidate periods)
  -> ProblemContext (holds data + metadata)
  -> [F] FeatureEngineer (creates feature vectors per slice)
  -> [A] SearchAlgorithm (finds optimal selection using [O] ObjectiveSet)
  -> [R] RepresentationModel (calculates weights)
  -> RepSetResult (selection, weights, scores)

F: Feature Space

Transforms raw time-series slices into comparable feature vectors.

Implementation Description
StandardStatsFeatureEngineer Statistical summaries per slice (mean, std, IQR, quantiles, ramp rates). Z-score normalized.
PCAFeatureEngineer PCA dimensionality reduction on existing features. Supports variance-threshold or fixed component count.
DirectProfileFeatureEngineer Flattened raw hourly profiles per slice. Preserves full temporal shape. Used by Snippet and DTW-based methods.
FeaturePipeline Chains multiple engineers sequentially and concatenates their outputs.
import energy_repset as rep

# Single engineer
feature_engineer = rep.StandardStatsFeatureEngineer()

# Chained pipeline: compute stats, then reduce with PCA
feature_pipeline = rep.FeaturePipeline(engineers={
    'stats': rep.StandardStatsFeatureEngineer(),
    'pca': rep.PCAFeatureEngineer(),
})

# Direct profile vectors (for Snippet, DTW-based methods)
direct = rep.DirectProfileFeatureEngineer()

O: Objective

An ObjectiveSet holds one or more weighted ScoreComponent instances. Each component evaluates how well a candidate selection represents the full dataset along a specific dimension.

Component Name Direction What it Measures
WassersteinFidelity wasserstein min Marginal distribution similarity (Wasserstein distance, IQR-normalized)
CorrelationFidelity correlation min Cross-variable correlation preservation (Frobenius norm)
DurationCurveFidelity nrmse_duration_curve min Duration curve match (quantile-based NRMSE)
NRMSEFidelity nrmse min Duration curve match (full interpolation NRMSE)
DiurnalFidelity diurnal min Hour-of-day profile preservation (normalized MSE)
DiurnalDTWFidelity diurnal_dtw min Hour-of-day profile preservation (DTW distance)
DTWFidelity dtw min Full series shape similarity (Dynamic Time Warping)
DiversityReward diversity max Spread of representatives in feature space (avg pairwise distance)
CentroidBalance centroid_balance min Feature centroid deviation from global mean
CoverageBalance coverage_balance min Balanced coverage via RBF kernel soft assignment
objective_set = rep.ObjectiveSet({
    'wasserstein': (1.0, rep.WassersteinFidelity()),
    'correlation': (1.0, rep.CorrelationFidelity()),
    'diversity':   (0.5, rep.DiversityReward()),
})

The weight (first element of each tuple) expresses relative importance. Components with direction="min" are better when smaller; direction="max" are better when larger.


S: Selection Space

A CombinationGenerator defines which subsets the search algorithm considers.

Implementation Description
ExhaustiveCombiGen All k-of-n combinations. Feasible for small n (e.g., 12 months, k=4 gives 495 candidates).
GroupQuotaCombiGen Enforces exact quotas per group (e.g., 1 month per season).
ExhaustiveHierarchicalCombiGen Selects parent groups (e.g., months) but evaluates on child slices (e.g., days).
GroupQuotaHierarchicalCombiGen Combines hierarchical selection with group quotas.
# Simple: all 4-of-12 monthly combinations
combi_gen = rep.ExhaustiveCombiGen(k=4)

# Hierarchical with seasonal constraints
combi_gen = rep.GroupQuotaHierarchicalCombiGen.from_slicers_with_seasons(
    parent_k=4,
    dt_index=df_raw.index,
    child_slicer=rep.TimeSlicer(unit="day"),
    group_quota={'winter': 1, 'spring': 1, 'summer': 1, 'fall': 1},
)

R: Representation Model

Determines how selected periods represent the full dataset through responsibility weights.

Implementation Description
UniformRepresentationModel Equal 1/k weights. Simplest option.
KMedoidsClustersizeRepresentation Weights proportional to cluster sizes from k-medoids hard assignment.
BlendedRepresentationModel Soft assignment: each original slice is a convex combination of representatives. Returns a weight matrix instead of a weight dict.
# Equal weights
uniform = rep.UniformRepresentationModel()

# Cluster-proportional weights
kmedoids = rep.KMedoidsClustersizeRepresentation()

# Soft blending (returns a DataFrame, not a dict)
blended = rep.BlendedRepresentationModel(blend_type='convex')

A: Search Algorithm

The engine that finds the optimal selection.

Implementation Workflow Type Description
ObjectiveDrivenCombinatorialSearchAlgorithm Generate-and-Test Evaluates all candidate combinations and selects the winner via a SelectionPolicy.
HullClusteringSearch Constructive Greedy forward selection minimizing total projection error. Leaves weights=None for external representation model.
CTPCSearch Constructive Contiguity-constrained hierarchical clustering. Pre-computes weights as segment size fractions.
SnippetSearch Constructive Greedy p-median selection of multi-day subsequences. Requires daily slicing. Pre-computes weights.

For details on the constructive algorithms, see Constructive Algorithms.

Selection Policies

The policy decides how to pick a winner from the scored candidates:

Policy Description
WeightedSumPolicy Scalar aggregation of scores. Supports normalization='robust_minmax' for multi-objective balance.
ParetoMaxMinStrategy Selects the Pareto-optimal solution that maximizes its worst-performing objective.
ParetoUtopiaPolicy Selects the Pareto-optimal solution closest to the utopia point.
# Weighted sum (default)
policy = rep.WeightedSumPolicy(normalization='robust_minmax')
search = rep.ObjectiveDrivenCombinatorialSearchAlgorithm(objective_set, policy, combi_gen)

# Pareto max-min
policy = rep.ParetoMaxMinStrategy()
search = rep.ObjectiveDrivenCombinatorialSearchAlgorithm(objective_set, policy, combi_gen)

# Constructive algorithms (no ObjectiveSet or policy needed)
hull = rep.HullClusteringSearch(k=4, hull_type='convex')
ctpc = rep.CTPCSearch(k=4, linkage='ward')
snippet = rep.SnippetSearch(k=4, period_length_days=7, step_days=7)

Diagnostics

Interactive Plotly visualizations for inspecting results, feature spaces, and score component behavior. See the Examples for rendered examples.

Feature Space

Class Purpose
FeatureSpaceScatter2D 2D scatter plot of feature space
FeatureSpaceScatter3D 3D scatter plot of feature space
FeatureSpaceScatterMatrix Pairwise scatter matrix
PCAVarianceExplained Cumulative variance explained by PCA components
FeatureCorrelationHeatmap Correlation heatmap between features
FeatureDistributions Distribution histograms per feature

Results

Class Purpose
ResponsibilityBars Weight distribution across selected representatives
ParetoScatter2D 2D objective-space scatter with Pareto front
ParetoScatterMatrix Pairwise objective-space scatter matrix
ParetoParallelCoordinates Parallel coordinates of Pareto front
ScoreContributionBars Per-component score breakdown

Score Components

Class Purpose
DistributionOverlayECDF ECDF comparison of full vs selected data
DistributionOverlayHistogram Histogram comparison of full vs selected data
CorrelationDifferenceHeatmap Correlation matrix difference heatmap
DiurnalProfileOverlay Diurnal profile comparison

Putting It Together

workflow = rep.Workflow(feature_engineer, search_algorithm, representation_model)
experiment = rep.RepSetExperiment(context, workflow)
result = experiment.run()

# result.selection -> tuple of selected slice identifiers
# result.weights   -> dict mapping each selected slice to its weight
# result.scores    -> dict mapping each objective name to its score

For a complete walkthrough, see the Getting Started guide. For the theoretical foundations, see the Unified Framework.