Workflow & Experiment¶

Workflow `dataclass` ¶

A serializable object that defines a complete selection problem.

This dataclass encapsulates all components needed to execute a representative subset selection workflow: feature engineering, search algorithm, and optionally a representation model.

Attributes:

Name	Type	Description
`feature_engineer`	`FeatureEngineer`	Component that transforms raw time-series into features.
`search_algorithm`	`SearchAlgorithm`	Algorithm that finds the optimal subset of k periods.
`representation_model`	`RepresentationModel \| None`	Model that calculates responsibility weights for selected periods. `None` when the search algorithm pre-computes its own weights (e.g. constructive algorithms like `KMedoidsSearch`).
`k`	`RepresentationModel \| None`	Number of representative periods to select.

Examples:

Define a complete workflow:

>>> from energy_repset.workflow import Workflow
>>> from energy_repset.feature_engineering import StandardStatsFeatureEngineer
>>> from energy_repset.search_algorithms import ObjectiveDrivenCombinatorialSearchAlgorithm
>>> from energy_repset.representation import UniformRepresentationModel
>>> from energy_repset.objectives import ObjectiveSet
>>> from energy_repset.score_components import WassersteinFidelity
>>> from energy_repset.selection_policies import ParetoMaxMinStrategy
>>> from energy_repset.combi_gens import ExhaustiveCombiGen
>>>
>>> # Create components
>>> feature_eng = StandardStatsFeatureEngineer()
>>> objective_set = ObjectiveSet({'wass': (1.0, WassersteinFidelity())})
>>> policy = ParetoMaxMinStrategy()
>>> combi_gen = ExhaustiveCombiGen(k=3)
>>> search_algo = ObjectiveDrivenCombinatorialSearchAlgorithm(
...     objective_set, policy, combi_gen
... )
>>> repr_model = UniformRepresentationModel()
>>>
>>> # Create workflow
>>> workflow = Workflow(
...     feature_engineer=feature_eng,
...     search_algorithm=search_algo,
...     representation_model=repr_model,
... )

Source code in energy_repset/workflow.py

@dataclass
class Workflow:
    """A serializable object that defines a complete selection problem.

    This dataclass encapsulates all components needed to execute a representative
    subset selection workflow: feature engineering, search algorithm, and
    optionally a representation model.

    Attributes:
        feature_engineer: Component that transforms raw time-series into features.
        search_algorithm: Algorithm that finds the optimal subset of k periods.
        representation_model: Model that calculates responsibility weights for
            selected periods. ``None`` when the search algorithm pre-computes
            its own weights (e.g. constructive algorithms like
            ``KMedoidsSearch``).
        k: Number of representative periods to select.

    Examples:
        Define a complete workflow:

        >>> from energy_repset.workflow import Workflow
        >>> from energy_repset.feature_engineering import StandardStatsFeatureEngineer
        >>> from energy_repset.search_algorithms import ObjectiveDrivenCombinatorialSearchAlgorithm
        >>> from energy_repset.representation import UniformRepresentationModel
        >>> from energy_repset.objectives import ObjectiveSet
        >>> from energy_repset.score_components import WassersteinFidelity
        >>> from energy_repset.selection_policies import ParetoMaxMinStrategy
        >>> from energy_repset.combi_gens import ExhaustiveCombiGen
        >>>
        >>> # Create components
        >>> feature_eng = StandardStatsFeatureEngineer()
        >>> objective_set = ObjectiveSet({'wass': (1.0, WassersteinFidelity())})
        >>> policy = ParetoMaxMinStrategy()
        >>> combi_gen = ExhaustiveCombiGen(k=3)
        >>> search_algo = ObjectiveDrivenCombinatorialSearchAlgorithm(
        ...     objective_set, policy, combi_gen
        ... )
        >>> repr_model = UniformRepresentationModel()
        >>>
        >>> # Create workflow
        >>> workflow = Workflow(
        ...     feature_engineer=feature_eng,
        ...     search_algorithm=search_algo,
        ...     representation_model=repr_model,
        ... )
    """
    feature_engineer: FeatureEngineer
    search_algorithm: SearchAlgorithm
    representation_model: Optional[RepresentationModel] = None

    def save(self, filepath: str | Path):
        """Save workflow configuration to file.

        Args:
            filepath: Path where workflow configuration will be saved.

        Raises:
            NotImplementedError: Workflow serialization is not yet implemented.
        """
        raise NotImplementedError("Workflow serialization not yet implemented.")

    @classmethod
    def load(cls, filepath: str | Path) -> "Workflow":
        """Load workflow configuration from file.

        Args:
            filepath: Path to workflow configuration file.

        Returns:
            Workflow: Reconstructed Workflow instance.

        Raises:
            NotImplementedError: Workflow deserialization is not yet implemented.
        """
        raise NotImplementedError("Workflow deserialization not yet implemented.")

save ¶

save(filepath: str | Path)

Save workflow configuration to file.

Parameters:

Name	Type	Description	Default
`filepath`	`str \| Path`	Path where workflow configuration will be saved.	required

Raises:

Type	Description
`NotImplementedError`	Workflow serialization is not yet implemented.

Source code in energy_repset/workflow.py

def save(self, filepath: str | Path):
    """Save workflow configuration to file.

    Args:
        filepath: Path where workflow configuration will be saved.

    Raises:
        NotImplementedError: Workflow serialization is not yet implemented.
    """
    raise NotImplementedError("Workflow serialization not yet implemented.")

load `classmethod` ¶

load(filepath: str | Path) -> 'Workflow'

Load workflow configuration from file.

Parameters:

Name	Type	Description	Default
`filepath`	`str \| Path`	Path to workflow configuration file.	required

Returns:

Name	Type	Description
`Workflow`	`'Workflow'`	Reconstructed Workflow instance.

Raises:

Type	Description
`NotImplementedError`	Workflow deserialization is not yet implemented.

Source code in energy_repset/workflow.py

@classmethod
def load(cls, filepath: str | Path) -> "Workflow":
    """Load workflow configuration from file.

    Args:
        filepath: Path to workflow configuration file.

    Returns:
        Workflow: Reconstructed Workflow instance.

    Raises:
        NotImplementedError: Workflow deserialization is not yet implemented.
    """
    raise NotImplementedError("Workflow deserialization not yet implemented.")

RepSetExperiment ¶

Orchestrate a complete and self-contained representative subset experiment.

This class manages the execution of a full workflow from raw data to final selection results. It handles feature engineering, search execution, and weight calculation while maintaining references to intermediate states.

Attributes:

Name	Type	Description
`raw_context`		Initial ProblemContext containing raw time-series data.
`workflow`		Workflow definition containing all algorithm components.
`result`	`RepSetResult`	Final RepSetResult after run() completes (None before execution).

Examples:

Run a complete experiment:

>>> import pandas as pd
>>> from energy_repset.problem import RepSetExperiment
>>> from energy_repset.context import ProblemContext
>>> from energy_repset.workflow import Workflow
>>> from energy_repset.time_slicer import TimeSlicer
>>> # ... (imports for feature engineer, search algo, etc.)
>>>
>>> # Create data and context
>>> dates = pd.date_range('2024-01-01', periods=8760, freq='h')
>>> df = pd.DataFrame({'demand': np.random.rand(8760)}, index=dates)
>>> slicer = TimeSlicer(unit='month')
>>> context = ProblemContext(df_raw=df, slicer=slicer)
>>>
>>> # Create workflow (see Workflow docs for details)
>>> workflow = Workflow(
...     feature_engineer=feature_eng,
...     search_algorithm=search_algo,
...     representation_model=repr_model,
...     k=3
... )
>>>
>>> # Run experiment
>>> experiment = RepSetExperiment(context, workflow)
>>> result = experiment.run()
>>> print(result.selection)  # Selected periods
>>> print(result.weights)    # Responsibility weights

Source code in energy_repset/problem.py

class RepSetExperiment:
    """Orchestrate a complete and self-contained representative subset experiment.

    This class manages the execution of a full workflow from raw data to final
    selection results. It handles feature engineering, search execution, and
    weight calculation while maintaining references to intermediate states.

    Attributes:
        raw_context: Initial ProblemContext containing raw time-series data.
        workflow: Workflow definition containing all algorithm components.
        result: Final RepSetResult after run() completes (None before execution).

    Examples:
        Run a complete experiment:

        >>> import pandas as pd
        >>> from energy_repset.problem import RepSetExperiment
        >>> from energy_repset.context import ProblemContext
        >>> from energy_repset.workflow import Workflow
        >>> from energy_repset.time_slicer import TimeSlicer
        >>> # ... (imports for feature engineer, search algo, etc.)
        >>>
        >>> # Create data and context
        >>> dates = pd.date_range('2024-01-01', periods=8760, freq='h')
        >>> df = pd.DataFrame({'demand': np.random.rand(8760)}, index=dates)
        >>> slicer = TimeSlicer(unit='month')
        >>> context = ProblemContext(df_raw=df, slicer=slicer)
        >>>
        >>> # Create workflow (see Workflow docs for details)
        >>> workflow = Workflow(
        ...     feature_engineer=feature_eng,
        ...     search_algorithm=search_algo,
        ...     representation_model=repr_model,
        ...     k=3
        ... )
        >>>
        >>> # Run experiment
        >>> experiment = RepSetExperiment(context, workflow)
        >>> result = experiment.run()
        >>> print(result.selection)  # Selected periods
        >>> print(result.weights)    # Responsibility weights
    """

    def __init__(self, context: ProblemContext, workflow: Workflow):
        """Initialize experiment with raw data context and workflow.

        Args:
            context: ProblemContext containing raw time-series data and metadata.
            workflow: Workflow defining feature engineering, search, and representation.
        """
        self.raw_context = context
        self.workflow = workflow

        # These will be populated after the run
        self._feature_context: ProblemContext = None
        self.result: RepSetResult = None

    @property
    def feature_context(self) -> ProblemContext:
        """Get the context with computed features.

        Returns:
            ProblemContext with df_features populated.

        Raises:
            ValueError: If run() or run_feature_engineer() has not been called yet.
        """
        if self._feature_context is None:
            if self.raw_context._df_features is not None:
                self._feature_context = self.raw_context.copy()
            else:
                raise ValueError('Please call run() or run_feature_engineer() first.')
        return self._feature_context

    def run_feature_engineer(self) -> ProblemContext:
        """Run only the feature engineering step.

        This method allows you to inspect features before running the full workflow.

        Returns:
            ProblemContext with df_features populated.
        """
        self._feature_context = self.workflow.feature_engineer.run(self.raw_context)
        return self._feature_context

    def run(self) -> RepSetResult:
        """Execute the entire workflow from feature engineering to final result.

        This method orchestrates the complete selection process:
        1. Runs the feature engineer to create a new, feature-rich context
        2. Stores this feature_context for user inspection
        3. Runs the search algorithm on the feature_context
        4. Fits the representation model
        5. Calculates the final weights
        6. Stores and returns the final result

        Returns:
            RepSetResult: The selected periods, weights, scores, and diagnostics.
        """
        if (self._feature_context is None) and (self.raw_context._df_features is None):
            self.run_feature_engineer()

        feature_context = self.feature_context
        search_algorithm = self.workflow.search_algorithm
        representation_model = self.workflow.representation_model

        result = search_algorithm.find_selection(feature_context)
        if result.weights is None:
            if representation_model is None:
                raise ValueError(
                    "Search algorithm returned weights=None but no "
                    "RepresentationModel was provided in the Workflow."
                )
            representation_model.fit(feature_context)
            result.weights = representation_model.weigh(result.selection)
        elif representation_model is not None:
            raise ValueError(
                "Search algorithms already set weights, but you still have a RepresentationModel defined. \n"
                "Make sure that either your SearchAlgorithm sets the weights OR you have a "
                "RepresentationModel for post-hoc weighting. \n"
                "You cannot have both."
            )

        self.result = result
        return self.result

feature_context `property` ¶

feature_context: ProblemContext

Get the context with computed features.

Returns:

Type	Description
`ProblemContext`	ProblemContext with df_features populated.

Raises:

Type	Description
`ValueError`	If run() or run_feature_engineer() has not been called yet.

init ¶

__init__(context: ProblemContext, workflow: Workflow)

Initialize experiment with raw data context and workflow.

Parameters:

Name	Type	Description	Default
`context`	`ProblemContext`	ProblemContext containing raw time-series data and metadata.	required
`workflow`	`Workflow`	Workflow defining feature engineering, search, and representation.	required

Source code in energy_repset/problem.py

def __init__(self, context: ProblemContext, workflow: Workflow):
    """Initialize experiment with raw data context and workflow.

    Args:
        context: ProblemContext containing raw time-series data and metadata.
        workflow: Workflow defining feature engineering, search, and representation.
    """
    self.raw_context = context
    self.workflow = workflow

    # These will be populated after the run
    self._feature_context: ProblemContext = None
    self.result: RepSetResult = None

run_feature_engineer ¶

run_feature_engineer() -> ProblemContext

Run only the feature engineering step.

This method allows you to inspect features before running the full workflow.

Returns:

Type	Description
`ProblemContext`	ProblemContext with df_features populated.

Source code in energy_repset/problem.py

def run_feature_engineer(self) -> ProblemContext:
    """Run only the feature engineering step.

    This method allows you to inspect features before running the full workflow.

    Returns:
        ProblemContext with df_features populated.
    """
    self._feature_context = self.workflow.feature_engineer.run(self.raw_context)
    return self._feature_context

run ¶

run() -> RepSetResult

Execute the entire workflow from feature engineering to final result.

This method orchestrates the complete selection process: 1. Runs the feature engineer to create a new, feature-rich context 2. Stores this feature_context for user inspection 3. Runs the search algorithm on the feature_context 4. Fits the representation model 5. Calculates the final weights 6. Stores and returns the final result

Returns:

Name	Type	Description
`RepSetResult`	`RepSetResult`	The selected periods, weights, scores, and diagnostics.

Source code in energy_repset/problem.py

def run(self) -> RepSetResult:
    """Execute the entire workflow from feature engineering to final result.

    This method orchestrates the complete selection process:
    1. Runs the feature engineer to create a new, feature-rich context
    2. Stores this feature_context for user inspection
    3. Runs the search algorithm on the feature_context
    4. Fits the representation model
    5. Calculates the final weights
    6. Stores and returns the final result

    Returns:
        RepSetResult: The selected periods, weights, scores, and diagnostics.
    """
    if (self._feature_context is None) and (self.raw_context._df_features is None):
        self.run_feature_engineer()

    feature_context = self.feature_context
    search_algorithm = self.workflow.search_algorithm
    representation_model = self.workflow.representation_model

    result = search_algorithm.find_selection(feature_context)
    if result.weights is None:
        if representation_model is None:
            raise ValueError(
                "Search algorithm returned weights=None but no "
                "RepresentationModel was provided in the Workflow."
            )
        representation_model.fit(feature_context)
        result.weights = representation_model.weigh(result.selection)
    elif representation_model is not None:
        raise ValueError(
            "Search algorithms already set weights, but you still have a RepresentationModel defined. \n"
            "Make sure that either your SearchAlgorithm sets the weights OR you have a "
            "RepresentationModel for post-hoc weighting. \n"
            "You cannot have both."
        )

    self.result = result
    return self.result

RepSetResult `dataclass` ¶

The standardized output object.

Source code in energy_repset/results.py

@dataclass
class RepSetResult:
    """The standardized output object."""
    context: ProblemContext
    selection_space: Literal['subset', 'synthetic', 'chronological']
    selection: SliceCombination
    scores: Dict[str, float]
    representatives: Dict[Hashable, pd.DataFrame]  # The actual data of the representatives
    weights: Union[Dict[Hashable, float], pd.DataFrame] = None  # Populated by RepresentationModel
    diagnostics: Dict[str, Any] = field(default_factory=dict)

Workflow & Experiment¶

Workflow dataclass ¶

save ¶

load classmethod ¶

RepSetExperiment ¶

feature_context property ¶

__init__ ¶

run_feature_engineer ¶

run ¶

RepSetResult dataclass ¶

Workflow `dataclass` ¶

load `classmethod` ¶

feature_context `property` ¶

init ¶

RepSetResult `dataclass` ¶