Workflow & Experiment¶
Workflow
dataclass
¶
A serializable object that defines a complete selection problem.
This dataclass encapsulates all components needed to execute a representative subset selection workflow: feature engineering, search algorithm, and optionally a representation model.
Attributes:
| Name | Type | Description |
|---|---|---|
feature_engineer |
FeatureEngineer
|
Component that transforms raw time-series into features. |
search_algorithm |
SearchAlgorithm
|
Algorithm that finds the optimal subset of k periods. |
representation_model |
RepresentationModel | None
|
Model that calculates responsibility weights for
selected periods. |
k |
RepresentationModel | None
|
Number of representative periods to select. |
Examples:
Define a complete workflow:
>>> from energy_repset.workflow import Workflow
>>> from energy_repset.feature_engineering import StandardStatsFeatureEngineer
>>> from energy_repset.search_algorithms import ObjectiveDrivenCombinatorialSearchAlgorithm
>>> from energy_repset.representation import UniformRepresentationModel
>>> from energy_repset.objectives import ObjectiveSet
>>> from energy_repset.score_components import WassersteinFidelity
>>> from energy_repset.selection_policies import ParetoMaxMinStrategy
>>> from energy_repset.combi_gens import ExhaustiveCombiGen
>>>
>>> # Create components
>>> feature_eng = StandardStatsFeatureEngineer()
>>> objective_set = ObjectiveSet({'wass': (1.0, WassersteinFidelity())})
>>> policy = ParetoMaxMinStrategy()
>>> combi_gen = ExhaustiveCombiGen(k=3)
>>> search_algo = ObjectiveDrivenCombinatorialSearchAlgorithm(
... objective_set, policy, combi_gen
... )
>>> repr_model = UniformRepresentationModel()
>>>
>>> # Create workflow
>>> workflow = Workflow(
... feature_engineer=feature_eng,
... search_algorithm=search_algo,
... representation_model=repr_model,
... )
Source code in energy_repset/workflow.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | |
save
¶
save(filepath: str | Path)
Save workflow configuration to file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str | Path
|
Path where workflow configuration will be saved. |
required |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
Workflow serialization is not yet implemented. |
Source code in energy_repset/workflow.py
63 64 65 66 67 68 69 70 71 72 | |
load
classmethod
¶
load(filepath: str | Path) -> 'Workflow'
Load workflow configuration from file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str | Path
|
Path to workflow configuration file. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Workflow |
'Workflow'
|
Reconstructed Workflow instance. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
Workflow deserialization is not yet implemented. |
Source code in energy_repset/workflow.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 | |
RepSetExperiment
¶
Orchestrate a complete and self-contained representative subset experiment.
This class manages the execution of a full workflow from raw data to final selection results. It handles feature engineering, search execution, and weight calculation while maintaining references to intermediate states.
Attributes:
| Name | Type | Description |
|---|---|---|
raw_context |
Initial ProblemContext containing raw time-series data. |
|
workflow |
Workflow definition containing all algorithm components. |
|
result |
RepSetResult
|
Final RepSetResult after run() completes (None before execution). |
Examples:
Run a complete experiment:
>>> import pandas as pd
>>> from energy_repset.problem import RepSetExperiment
>>> from energy_repset.context import ProblemContext
>>> from energy_repset.workflow import Workflow
>>> from energy_repset.time_slicer import TimeSlicer
>>> # ... (imports for feature engineer, search algo, etc.)
>>>
>>> # Create data and context
>>> dates = pd.date_range('2024-01-01', periods=8760, freq='h')
>>> df = pd.DataFrame({'demand': np.random.rand(8760)}, index=dates)
>>> slicer = TimeSlicer(unit='month')
>>> context = ProblemContext(df_raw=df, slicer=slicer)
>>>
>>> # Create workflow (see Workflow docs for details)
>>> workflow = Workflow(
... feature_engineer=feature_eng,
... search_algorithm=search_algo,
... representation_model=repr_model,
... k=3
... )
>>>
>>> # Run experiment
>>> experiment = RepSetExperiment(context, workflow)
>>> result = experiment.run()
>>> print(result.selection) # Selected periods
>>> print(result.weights) # Responsibility weights
Source code in energy_repset/problem.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | |
feature_context
property
¶
feature_context: ProblemContext
Get the context with computed features.
Returns:
| Type | Description |
|---|---|
ProblemContext
|
ProblemContext with df_features populated. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If run() or run_feature_engineer() has not been called yet. |
__init__
¶
__init__(context: ProblemContext, workflow: Workflow)
Initialize experiment with raw data context and workflow.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
ProblemContext
|
ProblemContext containing raw time-series data and metadata. |
required |
workflow
|
Workflow
|
Workflow defining feature engineering, search, and representation. |
required |
Source code in energy_repset/problem.py
54 55 56 57 58 59 60 61 62 63 64 65 66 | |
run_feature_engineer
¶
run_feature_engineer() -> ProblemContext
Run only the feature engineering step.
This method allows you to inspect features before running the full workflow.
Returns:
| Type | Description |
|---|---|
ProblemContext
|
ProblemContext with df_features populated. |
Source code in energy_repset/problem.py
85 86 87 88 89 90 91 92 93 94 | |
run
¶
run() -> RepSetResult
Execute the entire workflow from feature engineering to final result.
This method orchestrates the complete selection process: 1. Runs the feature engineer to create a new, feature-rich context 2. Stores this feature_context for user inspection 3. Runs the search algorithm on the feature_context 4. Fits the representation model 5. Calculates the final weights 6. Stores and returns the final result
Returns:
| Name | Type | Description |
|---|---|---|
RepSetResult |
RepSetResult
|
The selected periods, weights, scores, and diagnostics. |
Source code in energy_repset/problem.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | |
RepSetResult
dataclass
¶
The standardized output object.
Source code in energy_repset/results.py
9 10 11 12 13 14 15 16 17 18 | |