Configuration Advisor¶

This document serves a dual purpose:

Human guide -- a structured decision tree for choosing energy-repset components.
AI system prompt -- a self-contained reference an LLM can use to interactively guide users through configuration.

For theory, see Unified Framework. For API details, see Modules & Components.

Component Catalog¶

F: Feature Engineering¶

Class	Import	Description
`StandardStatsFeatureEngineer`	`energy_repset.feature_engineering`	Statistical summaries per slice (mean, std, IQR, quantiles, ramp rates). Z-score normalized.
`PCAFeatureEngineer`	`energy_repset.feature_engineering`	PCA dimensionality reduction. Supports variance-threshold or fixed component count.
`DirectProfileFeatureEngineer`	`energy_repset.feature_engineering`	Flattened raw hourly profiles per slice. Used by Snippet and DTW-based methods.
`FeaturePipeline`	`energy_repset.feature_engineering`	Chains multiple engineers sequentially.

Typical pipeline: StandardStatsFeatureEngineer -> PCAFeatureEngineer (via FeaturePipeline). Direct profiles: DirectProfileFeatureEngineer for algorithms that compare raw time-series shapes.

O: Score Components¶

All components implement the ScoreComponent protocol with prepare(context) and score(combination).

Class	Direction	What it Measures	Import
`WassersteinFidelity`	min	Marginal distribution similarity (Wasserstein distance, IQR-normalized)	`energy_repset.score_components`
`CorrelationFidelity`	min	Cross-variable correlation preservation (Frobenius norm)	`energy_repset.score_components`
`DurationCurveFidelity`	min	Duration curve match (quantile-based NRMSE)	`energy_repset.score_components`
`NRMSEFidelity`	min	Duration curve match (full interpolation NRMSE)	`energy_repset.score_components`
`DiurnalFidelity`	min	Hour-of-day profile preservation (normalized MSE)	`energy_repset.score_components`
`DiurnalDTWFidelity`	min	Hour-of-day profile preservation (DTW distance)	`energy_repset.score_components`
`DTWFidelity`	min	Full series shape similarity (Dynamic Time Warping)	`energy_repset.score_components`
`DiversityReward`	max	Spread in feature space (avg pairwise distance)	`energy_repset.score_components`
`CentroidBalance`	min	Feature centroid deviation from global mean	`energy_repset.score_components`
`CoverageBalance`	min	Balanced coverage via RBF kernel soft assignment	`energy_repset.score_components`

Components are bundled into an ObjectiveSet (energy_repset.objectives) with per-component weights.

S: Combination Generators¶

Class	Import	Description
`ExhaustiveCombiGen`	`energy_repset.combi_gens`	All k-of-n combinations.
`GroupQuotaCombiGen`	`energy_repset.combi_gens`	Exact quotas per group (e.g., 1 per season).
`ExhaustiveHierarchicalCombiGen`	`energy_repset.combi_gens`	Selects parent groups, evaluates on child slices.
`GroupQuotaHierarchicalCombiGen`	`energy_repset.combi_gens`	Hierarchical + group quotas. Has `from_slicers_with_seasons()` factory.

R: Representation Models¶

Class	Import	Description
`UniformRepresentationModel`	`energy_repset.representation`	Equal 1/k weights. Returns `dict`.
`KMedoidsClustersizeRepresentation`	`energy_repset.representation`	Cluster-proportional weights via k-medoids. Returns `dict`.
`BlendedRepresentationModel`	`energy_repset.representation`	Soft assignment (convex combination). Returns weight `DataFrame`.

A: Search Algorithms¶

Class	Workflow	Import
`ObjectiveDrivenCombinatorialSearchAlgorithm`	Generate-and-Test	`energy_repset.search_algorithms`
`HullClusteringSearch`	Constructive	`energy_repset.search_algorithms`
`CTPCSearch`	Constructive	`energy_repset.search_algorithms`
`SnippetSearch`	Constructive	`energy_repset.search_algorithms`

Not yet implemented: Direct Optimization (MILP).

See Constructive Algorithms for algorithm details and paper references.

Pi: Selection Policies¶

Class	Import	Description
`WeightedSumPolicy`	`energy_repset.selection_policies`	Scalar aggregation. Supports `normalization='robust_minmax'`.
`ParetoMaxMinStrategy`	`energy_repset.selection_policies`	Pareto-optimal solution maximizing worst objective.
`ParetoUtopiaPolicy`	`energy_repset.selection_policies`	Pareto-optimal solution closest to utopia point.

Decision Tree¶

Step 1: Understand your data¶

Ask yourself:

Resolution: Hourly? 15-minute? Daily?
Variables: How many time-series (load, wind, solar, prices, ...)?
Regions: How many regions or zones? Each region-variable pair adds dimensions.
Horizon: One year? Multiple years?
Candidate count: How many slices does your TimeSlicer produce?
12 months -> C(12,3) = 220 candidates for k=3
52 weeks -> C(52,8) = 752 million candidates for k=8
Feature dimensionality: How many features will your feature space produce? With V variables across R regions, StandardStatsFeatureEngineer produces multiple statistics per variable (mean, std, quantiles, ramp rates, etc.), so the total feature count grows as V x R x (number of statistics). If this count is high relative to the number of slices, consider PCA dimensionality reduction via FeaturePipeline.

This determines whether exhaustive search is feasible or you need constrained/hierarchical generation.

Step 1b: Normalization and weighting¶

Normalization: StandardStatsFeatureEngineer z-score normalizes features by default. If using DirectProfileFeatureEngineer, your variables may have very different scales (e.g., MW demand vs. capacity factors between 0 and 1), and distance-based methods will be dominated by the high-magnitude variables. Consider whether explicit scaling is needed.
Feature/variable weighting: If certain variables matter more for the downstream model (e.g., load is more critical than temperature), consider using variable_weights in score components such as WassersteinFidelity(variable_weights=...) or passing importance weights through the feature engineer. This lets the selection process prioritize fidelity on the variables that matter most.

Step 2: Downstream model constraints¶

Your energy system model may impose constraints on the representation:

Constraint	Implication
Model requires equal-length periods with scalar weights	Use `UniformRepresentationModel` or `KMedoidsClustersizeRepresentation`
Model can accept blended inputs (e.g., weighted hourly profiles)	Use `BlendedRepresentationModel`
Must cover all seasons	Use `GroupQuotaCombiGen` or `GroupQuotaHierarchicalCombiGen`
Must preserve temporal coupling within periods (e.g., multi-day storage)	Prefer weekly/multi-day slicing over monthly

Step 3: Computational budget¶

Candidate space size	Recommended generator
< 10,000	`ExhaustiveCombiGen` -- evaluate all
10,000 -- 1,000,000	`GroupQuotaCombiGen` to constrain, or hierarchical generators
> 1,000,000	Hierarchical generators, or future genetic/constructive algorithms

Hierarchical trick: Select at the month level (small combinatorial space) but evaluate on day-level features (high resolution). Use ExhaustiveHierarchicalCombiGen or GroupQuotaHierarchicalCombiGen.

Step 4: Quality goals¶

Choose score components based on what matters for your downstream model:

Goal	Recommended Components
Preserve marginal distributions (load duration curves)	`WassersteinFidelity`, `DurationCurveFidelity`, `NRMSEFidelity`
Preserve variable correlations (wind-solar complementarity)	`CorrelationFidelity`
Preserve diurnal patterns (solar noon peak, evening ramp)	`DiurnalFidelity`, `DiurnalDTWFidelity`
Preserve overall time-series shape	`DTWFidelity`
Ensure diverse representatives (avoid redundancy)	`DiversityReward`
Balanced coverage of the feature space	`CentroidBalance`, `CoverageBalance`

Start simple: WassersteinFidelity + CorrelationFidelity covers most needs. Add more components only if you observe specific deficiencies in the results.

Step 5: Selection policy¶

Situation	Recommended Policy
Single objective or clear priority ranking	`WeightedSumPolicy` (default)
Multiple objectives, want balanced trade-off	`WeightedSumPolicy(normalization='robust_minmax')`
Multiple objectives, want to avoid worst-case failure	`ParetoMaxMinStrategy`
Multiple objectives, want closest to ideal	`ParetoUtopiaPolicy`

Common Configurations¶

All examples below assume import energy_repset as rep.

Minimal: single-objective monthly selection¶

import energy_repset as rep

context = rep.ProblemContext(df_raw=df_raw, slicer=rep.TimeSlicer(unit="month"))
workflow = rep.Workflow(
    feature_engineer=rep.StandardStatsFeatureEngineer(),
    search_algorithm=rep.ObjectiveDrivenCombinatorialSearchAlgorithm(
        rep.ObjectiveSet({'wass': (1.0, rep.WassersteinFidelity())}),
        rep.WeightedSumPolicy(),
        rep.ExhaustiveCombiGen(k=4),
    ),
    representation_model=rep.UniformRepresentationModel(),
)
result = rep.RepSetExperiment(context, workflow).run()

Multi-objective with PCA features¶

feature_pipeline = rep.FeaturePipeline(engineers={
    'stats': rep.StandardStatsFeatureEngineer(),
    'pca': rep.PCAFeatureEngineer(),
})

objective_set = rep.ObjectiveSet({
    'wasserstein': (1.0, rep.WassersteinFidelity()),
    'correlation': (1.0, rep.CorrelationFidelity()),
    'diversity':   (0.5, rep.DiversityReward()),
})

workflow = rep.Workflow(
    feature_engineer=feature_pipeline,
    search_algorithm=rep.ObjectiveDrivenCombinatorialSearchAlgorithm(
        objective_set, rep.ParetoMaxMinStrategy(), rep.ExhaustiveCombiGen(k=3),
    ),
    representation_model=rep.KMedoidsClustersizeRepresentation(),
)
result = rep.RepSetExperiment(context, workflow).run()

Seasonal constraints with hierarchical search¶

child_slicer = rep.TimeSlicer(unit="day")
context = rep.ProblemContext(df_raw=df_raw, slicer=child_slicer)

combi_gen = rep.GroupQuotaHierarchicalCombiGen.from_slicers_with_seasons(
    parent_k=4,
    dt_index=df_raw.index,
    child_slicer=child_slicer,
    group_quota={'winter': 1, 'spring': 1, 'summer': 1, 'fall': 1},
)

workflow = rep.Workflow(
    feature_engineer=rep.StandardStatsFeatureEngineer(),
    search_algorithm=rep.ObjectiveDrivenCombinatorialSearchAlgorithm(
        objective_set, rep.WeightedSumPolicy(), combi_gen,
    ),
    representation_model=rep.KMedoidsClustersizeRepresentation(),
)
result = rep.RepSetExperiment(context, workflow).run()

Blended (soft) representation¶

workflow = rep.Workflow(
    feature_engineer=feature_pipeline,
    search_algorithm=search_algorithm,
    representation_model=rep.BlendedRepresentationModel(blend_type='convex'),
)
result = rep.RepSetExperiment(context, workflow).run()
# result.weights is a DataFrame (not a dict) for blended models

Constructive: Hull Clustering with blended weights¶

import energy_repset as rep

context = rep.ProblemContext(df_raw=df_raw, slicer=rep.TimeSlicer(unit="month"))
workflow = rep.Workflow(
    feature_engineer=rep.StandardStatsFeatureEngineer(),
    search_algorithm=rep.HullClusteringSearch(k=3, hull_type='convex'),
    representation_model=rep.BlendedRepresentationModel(blend_type='convex'),
)
result = rep.RepSetExperiment(context, workflow).run()

Constructive: CTPC with contiguous segments¶

import energy_repset as rep

context = rep.ProblemContext(df_raw=df_raw, slicer=rep.TimeSlicer(unit="month"))
workflow = rep.Workflow(
    feature_engineer=rep.StandardStatsFeatureEngineer(),
    search_algorithm=rep.CTPCSearch(k=4, linkage='ward'),
    representation_model=None,
)
result = rep.RepSetExperiment(context, workflow).run()
# result.weights are pre-computed segment fractions

Constructive: Snippet with multi-day periods¶

import energy_repset as rep

context = rep.ProblemContext(df_raw=df_raw, slicer=rep.TimeSlicer(unit="day"))
workflow = rep.Workflow(
    feature_engineer=rep.DirectProfileFeatureEngineer(),
    search_algorithm=rep.SnippetSearch(k=4, period_length_days=7, step_days=7),
    representation_model=None,
)
result = rep.RepSetExperiment(context, workflow).run()
# result.weights are pre-computed assignment fractions

Common Pitfalls¶

Blended weight aggregation: BlendedRepresentationModel.weigh() returns a weight matrix. If you sum columns for visualization, normalize the result so weights sum to 1.0 (otherwise bars show raw sums that scale with N).
Combinatorial explosion: C(52, 8) = 752 million. Always check combi_gen.count(slices) before running. Use hierarchical generators or group quotas to reduce the search space.
PCA without stats: PCAFeatureEngineer operates on existing features. It must come after StandardStatsFeatureEngineer in a FeaturePipeline, not as a standalone.
DTW components are slow: DTWFidelity and DiurnalDTWFidelity use dynamic time warping which is O(n^2) per pair. Suitable for small candidate sets; consider cheaper alternatives for large searches.
Direction confusion: Most fidelity components use direction="min" (lower is better). DiversityReward uses direction="max". The ObjectiveSet and selection policies handle direction automatically -- you do not need to negate scores.
Single vs multi-objective: With a single score component, WeightedSumPolicy and ParetoMaxMinStrategy produce identical results. Pareto-based policies only add value with 2+ objectives.
High-dimensional feature spaces: With many variables and regions, StandardStatsFeatureEngineer can produce hundreds of features while you may have only 12--52 candidate slices. In high dimensions, distances concentrate and clustering/selection degrades. Check the feature-to-sample ratio and use PCA (via FeaturePipeline) to reduce dimensionality when it is large.
Unweighted variables: By default, all variables contribute equally to the objective. If your downstream model is more sensitive to some variables (e.g., load matters more than temperature), the selection may over-optimize for less important variables. Use variable_weights in score components to reflect what actually matters.