Skip to content

Workflow & Experiment

Workflow dataclass

A serializable object that defines a complete selection problem.

This dataclass encapsulates all components needed to execute a representative subset selection workflow: feature engineering, search algorithm, and optionally a representation model.

Attributes:

Name Type Description
feature_engineer FeatureEngineer

Component that transforms raw time-series into features.

search_algorithm SearchAlgorithm

Algorithm that finds the optimal subset of k periods.

representation_model RepresentationModel | None

Model that calculates responsibility weights for selected periods. None when the search algorithm pre-computes its own weights (e.g. constructive algorithms like KMedoidsSearch).

k RepresentationModel | None

Number of representative periods to select.

Examples:

Define a complete workflow:

>>> from energy_repset.workflow import Workflow
>>> from energy_repset.feature_engineering import StandardStatsFeatureEngineer
>>> from energy_repset.search_algorithms import ObjectiveDrivenCombinatorialSearchAlgorithm
>>> from energy_repset.representation import UniformRepresentationModel
>>> from energy_repset.objectives import ObjectiveSet
>>> from energy_repset.score_components import WassersteinFidelity
>>> from energy_repset.selection_policies import ParetoMaxMinStrategy
>>> from energy_repset.combi_gens import ExhaustiveCombiGen
>>>
>>> # Create components
>>> feature_eng = StandardStatsFeatureEngineer()
>>> objective_set = ObjectiveSet({'wass': (1.0, WassersteinFidelity())})
>>> policy = ParetoMaxMinStrategy()
>>> combi_gen = ExhaustiveCombiGen(k=3)
>>> search_algo = ObjectiveDrivenCombinatorialSearchAlgorithm(
...     objective_set, policy, combi_gen
... )
>>> repr_model = UniformRepresentationModel()
>>>
>>> # Create workflow
>>> workflow = Workflow(
...     feature_engineer=feature_eng,
...     search_algorithm=search_algo,
...     representation_model=repr_model,
... )
Source code in energy_repset/workflow.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
@dataclass
class Workflow:
    """A serializable object that defines a complete selection problem.

    This dataclass encapsulates all components needed to execute a representative
    subset selection workflow: feature engineering, search algorithm, and
    optionally a representation model.

    Attributes:
        feature_engineer: Component that transforms raw time-series into features.
        search_algorithm: Algorithm that finds the optimal subset of k periods.
        representation_model: Model that calculates responsibility weights for
            selected periods. ``None`` when the search algorithm pre-computes
            its own weights (e.g. constructive algorithms like
            ``KMedoidsSearch``).
        k: Number of representative periods to select.

    Examples:
        Define a complete workflow:

        >>> from energy_repset.workflow import Workflow
        >>> from energy_repset.feature_engineering import StandardStatsFeatureEngineer
        >>> from energy_repset.search_algorithms import ObjectiveDrivenCombinatorialSearchAlgorithm
        >>> from energy_repset.representation import UniformRepresentationModel
        >>> from energy_repset.objectives import ObjectiveSet
        >>> from energy_repset.score_components import WassersteinFidelity
        >>> from energy_repset.selection_policies import ParetoMaxMinStrategy
        >>> from energy_repset.combi_gens import ExhaustiveCombiGen
        >>>
        >>> # Create components
        >>> feature_eng = StandardStatsFeatureEngineer()
        >>> objective_set = ObjectiveSet({'wass': (1.0, WassersteinFidelity())})
        >>> policy = ParetoMaxMinStrategy()
        >>> combi_gen = ExhaustiveCombiGen(k=3)
        >>> search_algo = ObjectiveDrivenCombinatorialSearchAlgorithm(
        ...     objective_set, policy, combi_gen
        ... )
        >>> repr_model = UniformRepresentationModel()
        >>>
        >>> # Create workflow
        >>> workflow = Workflow(
        ...     feature_engineer=feature_eng,
        ...     search_algorithm=search_algo,
        ...     representation_model=repr_model,
        ... )
    """
    feature_engineer: FeatureEngineer
    search_algorithm: SearchAlgorithm
    representation_model: Optional[RepresentationModel] = None

    def save(self, filepath: str | Path):
        """Save workflow configuration to file.

        Args:
            filepath: Path where workflow configuration will be saved.

        Raises:
            NotImplementedError: Workflow serialization is not yet implemented.
        """
        raise NotImplementedError("Workflow serialization not yet implemented.")

    @classmethod
    def load(cls, filepath: str | Path) -> "Workflow":
        """Load workflow configuration from file.

        Args:
            filepath: Path to workflow configuration file.

        Returns:
            Workflow: Reconstructed Workflow instance.

        Raises:
            NotImplementedError: Workflow deserialization is not yet implemented.
        """
        raise NotImplementedError("Workflow deserialization not yet implemented.")

save

save(filepath: str | Path)

Save workflow configuration to file.

Parameters:

Name Type Description Default
filepath str | Path

Path where workflow configuration will be saved.

required

Raises:

Type Description
NotImplementedError

Workflow serialization is not yet implemented.

Source code in energy_repset/workflow.py
63
64
65
66
67
68
69
70
71
72
def save(self, filepath: str | Path):
    """Save workflow configuration to file.

    Args:
        filepath: Path where workflow configuration will be saved.

    Raises:
        NotImplementedError: Workflow serialization is not yet implemented.
    """
    raise NotImplementedError("Workflow serialization not yet implemented.")

load classmethod

load(filepath: str | Path) -> 'Workflow'

Load workflow configuration from file.

Parameters:

Name Type Description Default
filepath str | Path

Path to workflow configuration file.

required

Returns:

Name Type Description
Workflow 'Workflow'

Reconstructed Workflow instance.

Raises:

Type Description
NotImplementedError

Workflow deserialization is not yet implemented.

Source code in energy_repset/workflow.py
74
75
76
77
78
79
80
81
82
83
84
85
86
87
@classmethod
def load(cls, filepath: str | Path) -> "Workflow":
    """Load workflow configuration from file.

    Args:
        filepath: Path to workflow configuration file.

    Returns:
        Workflow: Reconstructed Workflow instance.

    Raises:
        NotImplementedError: Workflow deserialization is not yet implemented.
    """
    raise NotImplementedError("Workflow deserialization not yet implemented.")

RepSetExperiment

Orchestrate a complete and self-contained representative subset experiment.

This class manages the execution of a full workflow from raw data to final selection results. It handles feature engineering, search execution, and weight calculation while maintaining references to intermediate states.

Attributes:

Name Type Description
raw_context

Initial ProblemContext containing raw time-series data.

workflow

Workflow definition containing all algorithm components.

result RepSetResult

Final RepSetResult after run() completes (None before execution).

Examples:

Run a complete experiment:

>>> import pandas as pd
>>> from energy_repset.problem import RepSetExperiment
>>> from energy_repset.context import ProblemContext
>>> from energy_repset.workflow import Workflow
>>> from energy_repset.time_slicer import TimeSlicer
>>> # ... (imports for feature engineer, search algo, etc.)
>>>
>>> # Create data and context
>>> dates = pd.date_range('2024-01-01', periods=8760, freq='h')
>>> df = pd.DataFrame({'demand': np.random.rand(8760)}, index=dates)
>>> slicer = TimeSlicer(unit='month')
>>> context = ProblemContext(df_raw=df, slicer=slicer)
>>>
>>> # Create workflow (see Workflow docs for details)
>>> workflow = Workflow(
...     feature_engineer=feature_eng,
...     search_algorithm=search_algo,
...     representation_model=repr_model,
...     k=3
... )
>>>
>>> # Run experiment
>>> experiment = RepSetExperiment(context, workflow)
>>> result = experiment.run()
>>> print(result.selection)  # Selected periods
>>> print(result.weights)    # Responsibility weights
Source code in energy_repset/problem.py
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
class RepSetExperiment:
    """Orchestrate a complete and self-contained representative subset experiment.

    This class manages the execution of a full workflow from raw data to final
    selection results. It handles feature engineering, search execution, and
    weight calculation while maintaining references to intermediate states.

    Attributes:
        raw_context: Initial ProblemContext containing raw time-series data.
        workflow: Workflow definition containing all algorithm components.
        result: Final RepSetResult after run() completes (None before execution).

    Examples:
        Run a complete experiment:

        >>> import pandas as pd
        >>> from energy_repset.problem import RepSetExperiment
        >>> from energy_repset.context import ProblemContext
        >>> from energy_repset.workflow import Workflow
        >>> from energy_repset.time_slicer import TimeSlicer
        >>> # ... (imports for feature engineer, search algo, etc.)
        >>>
        >>> # Create data and context
        >>> dates = pd.date_range('2024-01-01', periods=8760, freq='h')
        >>> df = pd.DataFrame({'demand': np.random.rand(8760)}, index=dates)
        >>> slicer = TimeSlicer(unit='month')
        >>> context = ProblemContext(df_raw=df, slicer=slicer)
        >>>
        >>> # Create workflow (see Workflow docs for details)
        >>> workflow = Workflow(
        ...     feature_engineer=feature_eng,
        ...     search_algorithm=search_algo,
        ...     representation_model=repr_model,
        ...     k=3
        ... )
        >>>
        >>> # Run experiment
        >>> experiment = RepSetExperiment(context, workflow)
        >>> result = experiment.run()
        >>> print(result.selection)  # Selected periods
        >>> print(result.weights)    # Responsibility weights
    """

    def __init__(self, context: ProblemContext, workflow: Workflow):
        """Initialize experiment with raw data context and workflow.

        Args:
            context: ProblemContext containing raw time-series data and metadata.
            workflow: Workflow defining feature engineering, search, and representation.
        """
        self.raw_context = context
        self.workflow = workflow

        # These will be populated after the run
        self._feature_context: ProblemContext = None
        self.result: RepSetResult = None

    @property
    def feature_context(self) -> ProblemContext:
        """Get the context with computed features.

        Returns:
            ProblemContext with df_features populated.

        Raises:
            ValueError: If run() or run_feature_engineer() has not been called yet.
        """
        if self._feature_context is None:
            if self.raw_context._df_features is not None:
                self._feature_context = self.raw_context.copy()
            else:
                raise ValueError('Please call run() or run_feature_engineer() first.')
        return self._feature_context

    def run_feature_engineer(self) -> ProblemContext:
        """Run only the feature engineering step.

        This method allows you to inspect features before running the full workflow.

        Returns:
            ProblemContext with df_features populated.
        """
        self._feature_context = self.workflow.feature_engineer.run(self.raw_context)
        return self._feature_context

    def run(self) -> RepSetResult:
        """Execute the entire workflow from feature engineering to final result.

        This method orchestrates the complete selection process:
        1. Runs the feature engineer to create a new, feature-rich context
        2. Stores this feature_context for user inspection
        3. Runs the search algorithm on the feature_context
        4. Fits the representation model
        5. Calculates the final weights
        6. Stores and returns the final result

        Returns:
            RepSetResult: The selected periods, weights, scores, and diagnostics.
        """
        if (self._feature_context is None) and (self.raw_context._df_features is None):
            self.run_feature_engineer()

        feature_context = self.feature_context
        search_algorithm = self.workflow.search_algorithm
        representation_model = self.workflow.representation_model

        result = search_algorithm.find_selection(feature_context)
        if result.weights is None:
            if representation_model is None:
                raise ValueError(
                    "Search algorithm returned weights=None but no "
                    "RepresentationModel was provided in the Workflow."
                )
            representation_model.fit(feature_context)
            result.weights = representation_model.weigh(result.selection)
        elif representation_model is not None:
            raise ValueError(
                "Search algorithms already set weights, but you still have a RepresentationModel defined. \n"
                "Make sure that either your SearchAlgorithm sets the weights OR you have a "
                "RepresentationModel for post-hoc weighting. \n"
                "You cannot have both."
            )

        self.result = result
        return self.result

feature_context property

feature_context: ProblemContext

Get the context with computed features.

Returns:

Type Description
ProblemContext

ProblemContext with df_features populated.

Raises:

Type Description
ValueError

If run() or run_feature_engineer() has not been called yet.

__init__

__init__(context: ProblemContext, workflow: Workflow)

Initialize experiment with raw data context and workflow.

Parameters:

Name Type Description Default
context ProblemContext

ProblemContext containing raw time-series data and metadata.

required
workflow Workflow

Workflow defining feature engineering, search, and representation.

required
Source code in energy_repset/problem.py
54
55
56
57
58
59
60
61
62
63
64
65
66
def __init__(self, context: ProblemContext, workflow: Workflow):
    """Initialize experiment with raw data context and workflow.

    Args:
        context: ProblemContext containing raw time-series data and metadata.
        workflow: Workflow defining feature engineering, search, and representation.
    """
    self.raw_context = context
    self.workflow = workflow

    # These will be populated after the run
    self._feature_context: ProblemContext = None
    self.result: RepSetResult = None

run_feature_engineer

run_feature_engineer() -> ProblemContext

Run only the feature engineering step.

This method allows you to inspect features before running the full workflow.

Returns:

Type Description
ProblemContext

ProblemContext with df_features populated.

Source code in energy_repset/problem.py
85
86
87
88
89
90
91
92
93
94
def run_feature_engineer(self) -> ProblemContext:
    """Run only the feature engineering step.

    This method allows you to inspect features before running the full workflow.

    Returns:
        ProblemContext with df_features populated.
    """
    self._feature_context = self.workflow.feature_engineer.run(self.raw_context)
    return self._feature_context

run

run() -> RepSetResult

Execute the entire workflow from feature engineering to final result.

This method orchestrates the complete selection process: 1. Runs the feature engineer to create a new, feature-rich context 2. Stores this feature_context for user inspection 3. Runs the search algorithm on the feature_context 4. Fits the representation model 5. Calculates the final weights 6. Stores and returns the final result

Returns:

Name Type Description
RepSetResult RepSetResult

The selected periods, weights, scores, and diagnostics.

Source code in energy_repset/problem.py
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
def run(self) -> RepSetResult:
    """Execute the entire workflow from feature engineering to final result.

    This method orchestrates the complete selection process:
    1. Runs the feature engineer to create a new, feature-rich context
    2. Stores this feature_context for user inspection
    3. Runs the search algorithm on the feature_context
    4. Fits the representation model
    5. Calculates the final weights
    6. Stores and returns the final result

    Returns:
        RepSetResult: The selected periods, weights, scores, and diagnostics.
    """
    if (self._feature_context is None) and (self.raw_context._df_features is None):
        self.run_feature_engineer()

    feature_context = self.feature_context
    search_algorithm = self.workflow.search_algorithm
    representation_model = self.workflow.representation_model

    result = search_algorithm.find_selection(feature_context)
    if result.weights is None:
        if representation_model is None:
            raise ValueError(
                "Search algorithm returned weights=None but no "
                "RepresentationModel was provided in the Workflow."
            )
        representation_model.fit(feature_context)
        result.weights = representation_model.weigh(result.selection)
    elif representation_model is not None:
        raise ValueError(
            "Search algorithms already set weights, but you still have a RepresentationModel defined. \n"
            "Make sure that either your SearchAlgorithm sets the weights OR you have a "
            "RepresentationModel for post-hoc weighting. \n"
            "You cannot have both."
        )

    self.result = result
    return self.result

RepSetResult dataclass

The standardized output object.

Source code in energy_repset/results.py
 9
10
11
12
13
14
15
16
17
18
@dataclass
class RepSetResult:
    """The standardized output object."""
    context: ProblemContext
    selection_space: Literal['subset', 'synthetic', 'chronological']
    selection: SliceCombination
    scores: Dict[str, float]
    representatives: Dict[Hashable, pd.DataFrame]  # The actual data of the representatives
    weights: Union[Dict[Hashable, float], pd.DataFrame] = None  # Populated by RepresentationModel
    diagnostics: Dict[str, Any] = field(default_factory=dict)