Modules & Components¶

energy-repset decomposes any representative period selection method into five interchangeable pillars. Each pillar has a protocol (interface) and one or more concrete implementations. Swapping a single component changes the behavior without affecting the rest of the pipeline.

The Five Pillars¶

Raw DataFrame
  -> TimeSlicer (defines candidate periods)
  -> ProblemContext (holds data + metadata)
  -> [F] FeatureEngineer (creates feature vectors per slice)
  -> [A] SearchAlgorithm (finds optimal selection using [O] ObjectiveSet)
  -> [R] RepresentationModel (calculates weights)
  -> RepSetResult (selection, weights, scores)

F: Feature Space¶

Transforms raw time-series slices into comparable feature vectors.

Implementation	Description
`StandardStatsFeatureEngineer`	Statistical summaries per slice (mean, std, IQR, quantiles, ramp rates). Z-score normalized.
`PCAFeatureEngineer`	PCA dimensionality reduction on existing features. Supports variance-threshold or fixed component count.
`DirectProfileFeatureEngineer`	Flattened raw hourly profiles per slice. Preserves full temporal shape. Used by Snippet and DTW-based methods.
`FeaturePipeline`	Chains multiple engineers sequentially and concatenates their outputs.

import energy_repset as rep

# Single engineer
feature_engineer = rep.StandardStatsFeatureEngineer()

# Chained pipeline: compute stats, then reduce with PCA
feature_pipeline = rep.FeaturePipeline(engineers={
    'stats': rep.StandardStatsFeatureEngineer(),
    'pca': rep.PCAFeatureEngineer(),
})

# Direct profile vectors (for Snippet, DTW-based methods)
direct = rep.DirectProfileFeatureEngineer()

O: Objective¶

An ObjectiveSet holds one or more weighted ScoreComponent instances. Each component evaluates how well a candidate selection represents the full dataset along a specific dimension.

Component	Name	Direction	What it Measures
`WassersteinFidelity`	`wasserstein`	min	Marginal distribution similarity (Wasserstein distance, IQR-normalized)
`CorrelationFidelity`	`correlation`	min	Cross-variable correlation preservation (Frobenius norm)
`DurationCurveFidelity`	`nrmse_duration_curve`	min	Duration curve match (quantile-based NRMSE)
`NRMSEFidelity`	`nrmse`	min	Duration curve match (full interpolation NRMSE)
`DiurnalFidelity`	`diurnal`	min	Hour-of-day profile preservation (normalized MSE)
`DiurnalDTWFidelity`	`diurnal_dtw`	min	Hour-of-day profile preservation (DTW distance)
`DTWFidelity`	`dtw`	min	Full series shape similarity (Dynamic Time Warping)
`DiversityReward`	`diversity`	max	Spread of representatives in feature space (avg pairwise distance)
`CentroidBalance`	`centroid_balance`	min	Feature centroid deviation from global mean
`CoverageBalance`	`coverage_balance`	min	Balanced coverage via RBF kernel soft assignment

objective_set = rep.ObjectiveSet({
    'wasserstein': (1.0, rep.WassersteinFidelity()),
    'correlation': (1.0, rep.CorrelationFidelity()),
    'diversity':   (0.5, rep.DiversityReward()),
})

The weight (first element of each tuple) expresses relative importance. Components with direction="min" are better when smaller; direction="max" are better when larger.

S: Selection Space¶

A CombinationGenerator defines which subsets the search algorithm considers.

Implementation	Description
`ExhaustiveCombiGen`	All k-of-n combinations. Feasible for small n (e.g., 12 months, k=4 gives 495 candidates).
`GroupQuotaCombiGen`	Enforces exact quotas per group (e.g., 1 month per season).
`ExhaustiveHierarchicalCombiGen`	Selects parent groups (e.g., months) but evaluates on child slices (e.g., days).
`GroupQuotaHierarchicalCombiGen`	Combines hierarchical selection with group quotas.

# Simple: all 4-of-12 monthly combinations
combi_gen = rep.ExhaustiveCombiGen(k=4)

# Hierarchical with seasonal constraints
combi_gen = rep.GroupQuotaHierarchicalCombiGen.from_slicers_with_seasons(
    parent_k=4,
    dt_index=df_raw.index,
    child_slicer=rep.TimeSlicer(unit="day"),
    group_quota={'winter': 1, 'spring': 1, 'summer': 1, 'fall': 1},
)

R: Representation Model¶

Determines how selected periods represent the full dataset through responsibility weights.

Implementation	Description
`UniformRepresentationModel`	Equal 1/k weights. Simplest option.
`KMedoidsClustersizeRepresentation`	Weights proportional to cluster sizes from k-medoids hard assignment.
`BlendedRepresentationModel`	Soft assignment: each original slice is a convex combination of representatives. Returns a weight matrix instead of a weight dict.

# Equal weights
uniform = rep.UniformRepresentationModel()

# Cluster-proportional weights
kmedoids = rep.KMedoidsClustersizeRepresentation()

# Soft blending (returns a DataFrame, not a dict)
blended = rep.BlendedRepresentationModel(blend_type='convex')

A: Search Algorithm¶

The engine that finds the optimal selection.

Implementation	Workflow Type	Description
`ObjectiveDrivenCombinatorialSearchAlgorithm`	Generate-and-Test	Evaluates all candidate combinations and selects the winner via a `SelectionPolicy`.
`HullClusteringSearch`	Constructive	Greedy forward selection minimizing total projection error. Leaves `weights=None` for external representation model.
`CTPCSearch`	Constructive	Contiguity-constrained hierarchical clustering. Pre-computes weights as segment size fractions.
`SnippetSearch`	Constructive	Greedy p-median selection of multi-day subsequences. Requires daily slicing. Pre-computes weights.

For details on the constructive algorithms, see Constructive Algorithms.

Selection Policies¶

The policy decides how to pick a winner from the scored candidates:

Policy	Description
`WeightedSumPolicy`	Scalar aggregation of scores. Supports `normalization='robust_minmax'` for multi-objective balance.
`ParetoMaxMinStrategy`	Selects the Pareto-optimal solution that maximizes its worst-performing objective.
`ParetoUtopiaPolicy`	Selects the Pareto-optimal solution closest to the utopia point.

# Weighted sum (default)
policy = rep.WeightedSumPolicy(normalization='robust_minmax')
search = rep.ObjectiveDrivenCombinatorialSearchAlgorithm(objective_set, policy, combi_gen)

# Pareto max-min
policy = rep.ParetoMaxMinStrategy()
search = rep.ObjectiveDrivenCombinatorialSearchAlgorithm(objective_set, policy, combi_gen)

# Constructive algorithms (no ObjectiveSet or policy needed)
hull = rep.HullClusteringSearch(k=4, hull_type='convex')
ctpc = rep.CTPCSearch(k=4, linkage='ward')
snippet = rep.SnippetSearch(k=4, period_length_days=7, step_days=7)

Diagnostics¶

Interactive Plotly visualizations for inspecting results, feature spaces, and score component behavior. See the Examples for rendered examples.

Feature Space¶

Class	Purpose
`FeatureSpaceScatter2D`	2D scatter plot of feature space
`FeatureSpaceScatter3D`	3D scatter plot of feature space
`FeatureSpaceScatterMatrix`	Pairwise scatter matrix
`PCAVarianceExplained`	Cumulative variance explained by PCA components
`FeatureCorrelationHeatmap`	Correlation heatmap between features
`FeatureDistributions`	Distribution histograms per feature

Results¶

Class	Purpose
`ResponsibilityBars`	Weight distribution across selected representatives
`ParetoScatter2D`	2D objective-space scatter with Pareto front
`ParetoScatterMatrix`	Pairwise objective-space scatter matrix
`ParetoParallelCoordinates`	Parallel coordinates of Pareto front
`ScoreContributionBars`	Per-component score breakdown

Score Components¶

Class	Purpose
`DistributionOverlayECDF`	ECDF comparison of full vs selected data
`DistributionOverlayHistogram`	Histogram comparison of full vs selected data
`CorrelationDifferenceHeatmap`	Correlation matrix difference heatmap
`DiurnalProfileOverlay`	Diurnal profile comparison

Putting It Together¶

workflow = rep.Workflow(feature_engineer, search_algorithm, representation_model)
experiment = rep.RepSetExperiment(context, workflow)
result = experiment.run()

# result.selection -> tuple of selected slice identifiers
# result.weights   -> dict mapping each selected slice to its weight
# result.scores    -> dict mapping each objective name to its score

For a complete walkthrough, see the Getting Started guide. For the theoretical foundations, see the Unified Framework.