Context & Slicing¶

ProblemContext ¶

A data container passed through the entire workflow.

This class holds all data and metadata needed for representative subset selection. It is the central object passed between workflow stages (feature engineering, search algorithms, representation models).

Parameters:

Name	Type	Description	Default
`df_raw`	`DataFrame`	Raw time-series data with datetime index and variable columns.	required
`slicer`	`'TimeSlicer'`	TimeSlicer defining how the time index is divided into candidate periods.	required
`metadata`	`dict[str, Any] \| None`	Optional dict for storing arbitrary user data (e.g., default weights, experiment configuration, notes, etc.). Not used by the framework itself, but available for user convenience and custom component implementations.	`None`

Examples:

Create a context with monthly slicing:

>>> import pandas as pd
>>> from energy_repset.context import ProblemContext
>>> from energy_repset.time_slicer import TimeSlicer
>>>
>>> # Create sample data
>>> dates = pd.date_range('2024-01-01', periods=8760, freq='h')
>>> df = pd.DataFrame({
...     'demand': np.random.rand(8760),
...     'solar': np.random.rand(8760)
... }, index=dates)
>>>
>>> # Create context with metadata
>>> slicer = TimeSlicer(unit='month')
>>> context = ProblemContext(
...     df_raw=df,
...     slicer=slicer,
...     metadata={
...         'experiment_name': 'test_run_1',
...         'default_weights': {'demand': 1.5, 'solar': 1.0},
...         'notes': 'Testing seasonal selection'
...     }
... )
>>> len(context.get_unique_slices())  # 12 months
    12
>>> context.metadata['experiment_name']  # 'test_run_1'
>>>
>>> # Create context without metadata
>>> context2 = ProblemContext(df_raw=df, slicer=slicer)
>>> context2.metadata  # {}

Source code in energy_repset/context.py

class ProblemContext:
    """A data container passed through the entire workflow.

    This class holds all data and metadata needed for representative subset selection.
    It is the central object passed between workflow stages (feature engineering,
    search algorithms, representation models).

    Args:
        df_raw: Raw time-series data with datetime index and variable columns.
        slicer: TimeSlicer defining how the time index is divided into candidate periods.
        metadata: Optional dict for storing arbitrary user data (e.g., default weights,
            experiment configuration, notes, etc.). Not used by the framework itself,
            but available for user convenience and custom component implementations.

    Examples:
        Create a context with monthly slicing:

        >>> import pandas as pd
        >>> from energy_repset.context import ProblemContext
        >>> from energy_repset.time_slicer import TimeSlicer
        >>>
        >>> # Create sample data
        >>> dates = pd.date_range('2024-01-01', periods=8760, freq='h')
        >>> df = pd.DataFrame({
        ...     'demand': np.random.rand(8760),
        ...     'solar': np.random.rand(8760)
        ... }, index=dates)
        >>>
        >>> # Create context with metadata
        >>> slicer = TimeSlicer(unit='month')
        >>> context = ProblemContext(
        ...     df_raw=df,
        ...     slicer=slicer,
        ...     metadata={
        ...         'experiment_name': 'test_run_1',
        ...         'default_weights': {'demand': 1.5, 'solar': 1.0},
        ...         'notes': 'Testing seasonal selection'
        ...     }
        ... )
        >>> len(context.get_unique_slices())  # 12 months
            12
        >>> context.metadata['experiment_name']  # 'test_run_1'
        >>>
        >>> # Create context without metadata
        >>> context2 = ProblemContext(df_raw=df, slicer=slicer)
        >>> context2.metadata  # {}
    """

    def __init__(
        self,
        df_raw: pd.DataFrame,
        slicer: 'TimeSlicer',
        metadata: Optional[Dict[str, Any]] = None
    ):
        """Initialize a ProblemContext.

        Args:
            df_raw: Raw time-series data with datetime index and variable columns.
            slicer: TimeSlicer defining how the time index is divided into candidate periods.
            metadata: Optional dict for storing arbitrary user data. Not used by the
                framework itself.
        """
        self.df_raw = df_raw
        self.slicer = slicer
        self.metadata = metadata if metadata is not None else {}
        self._df_features: Optional[pd.DataFrame] = None

    def copy(self) -> 'ProblemContext':
        """Create a deep copy of this ProblemContext instance.

        Returns:
            A new, independent instance of the context with all data copied.
        """
        return copy.deepcopy(self)

    def get_sliced_data(self) -> Dict[Hashable, pd.DataFrame]:
        """Generate sliced raw data on demand.

        Returns:
            Dictionary mapping slice labels to their corresponding DataFrame chunks.

        Raises:
            NotImplementedError: This method is not yet implemented.
        """
        raise NotImplementedError

    @property
    def df_features(self) -> pd.DataFrame:
        """Get the computed feature matrix.

        Returns:
            DataFrame with slice labels as index and engineered features as columns.

        Raises:
            ValueError: If features have not been computed yet. Use a FeatureEngineer
                to populate this field first.
        """
        if self._df_features is None:
            raise ValueError(
                f'You tried to retrieve df_features before assigning it. Please set first using a FeatureEngineer.'
            )
        return self._df_features

    @df_features.setter
    def df_features(self, df_features: pd.DataFrame):
        """Set the feature matrix.

        Args:
            df_features: DataFrame with slice labels as index and features as columns.

        Raises:
            ValueError: If df_features index does not contain all expected slices
                from the slicer.
        """
        self._validate_all_slices_present_in_features_df(df_features)

        self._df_features = df_features

    def _validate_all_slices_present_in_features_df(self, df_features):
        expected_slices = set(self.get_unique_slices())
        actual_slices = set(df_features.index)
        if not expected_slices.issubset(actual_slices):
            missing_slices = expected_slices - actual_slices
            raise ValueError(
                f"df_features is missing {len(missing_slices)} slice(s). "
                f"Expected all slices from slicer but missing: {sorted(list(missing_slices)[:5])}"
                f"{'...' if len(missing_slices) > 5 else ''}"
            )

    def get_unique_slices(self) -> List[Hashable]:
        """Get list of all unique slice labels from the time index.

        Returns:
            List of slice labels (e.g., Period objects for monthly slicing).
        """
        return self.slicer.unique_slices(self.df_raw.index)

df_features `property` `writable` ¶

df_features: DataFrame

Get the computed feature matrix.

Returns:

Type	Description
`DataFrame`	DataFrame with slice labels as index and engineered features as columns.

Raises:

Type	Description
`ValueError`	If features have not been computed yet. Use a FeatureEngineer to populate this field first.

init ¶

__init__(df_raw: DataFrame, slicer: 'TimeSlicer', metadata: dict[str, Any] | None = None)

Initialize a ProblemContext.

Parameters:

Name	Type	Description	Default
`df_raw`	`DataFrame`	Raw time-series data with datetime index and variable columns.	required
`slicer`	`'TimeSlicer'`	TimeSlicer defining how the time index is divided into candidate periods.	required
`metadata`	`dict[str, Any] \| None`	Optional dict for storing arbitrary user data. Not used by the framework itself.	`None`

Source code in energy_repset/context.py

def __init__(
    self,
    df_raw: pd.DataFrame,
    slicer: 'TimeSlicer',
    metadata: Optional[Dict[str, Any]] = None
):
    """Initialize a ProblemContext.

    Args:
        df_raw: Raw time-series data with datetime index and variable columns.
        slicer: TimeSlicer defining how the time index is divided into candidate periods.
        metadata: Optional dict for storing arbitrary user data. Not used by the
            framework itself.
    """
    self.df_raw = df_raw
    self.slicer = slicer
    self.metadata = metadata if metadata is not None else {}
    self._df_features: Optional[pd.DataFrame] = None

copy ¶

copy() -> 'ProblemContext'

Create a deep copy of this ProblemContext instance.

Returns:

Type	Description
`'ProblemContext'`	A new, independent instance of the context with all data copied.

Source code in energy_repset/context.py

def copy(self) -> 'ProblemContext':
    """Create a deep copy of this ProblemContext instance.

    Returns:
        A new, independent instance of the context with all data copied.
    """
    return copy.deepcopy(self)

get_sliced_data ¶

get_sliced_data() -> dict[Hashable, DataFrame]

Generate sliced raw data on demand.

Returns:

Type	Description
`dict[Hashable, DataFrame]`	Dictionary mapping slice labels to their corresponding DataFrame chunks.

Raises:

Type	Description
`NotImplementedError`	This method is not yet implemented.

Source code in energy_repset/context.py

def get_sliced_data(self) -> Dict[Hashable, pd.DataFrame]:
    """Generate sliced raw data on demand.

    Returns:
        Dictionary mapping slice labels to their corresponding DataFrame chunks.

    Raises:
        NotImplementedError: This method is not yet implemented.
    """
    raise NotImplementedError

get_unique_slices ¶

get_unique_slices() -> list[Hashable]

Get list of all unique slice labels from the time index.

Returns:

Type	Description
`list[Hashable]`	List of slice labels (e.g., Period objects for monthly slicing).

Source code in energy_repset/context.py

def get_unique_slices(self) -> List[Hashable]:
    """Get list of all unique slice labels from the time index.

    Returns:
        List of slice labels (e.g., Period objects for monthly slicing).
    """
    return self.slicer.unique_slices(self.df_raw.index)

TimeSlicer ¶

Convert a DatetimeIndex into labeled time slices.

This class defines how the time index is divided into candidate periods for representative subset selection. It converts timestamps into Period objects or floored timestamps based on the specified temporal granularity.

Parameters:

Name	Type	Description	Default
`unit`	`SliceUnit`	Temporal granularity of the slices. One of "year", "month", "week", "day", or "hour".	required

Attributes:

Name	Type	Description
`unit`		The temporal granularity used for slicing.

Note

The labels are hashable and suitable for set membership and grouping. Period objects are used for year, month, week, and day. Naive timestamps (floored to hour) are used for hourly slicing.

Examples:

Create a slicer for monthly periods:

>>> import pandas as pd
>>> from energy_repset.time_slicer import TimeSlicer
>>>
>>> dates = pd.date_range('2024-01-01', periods=8760, freq='h')
>>> slicer = TimeSlicer(unit='month')
>>> labels = slicer.labels_for_index(dates)
>>> unique_months = slicer.unique_slices(dates)
>>> len(unique_months)  # 12 months in a year
    12
>>> unique_months[0]  # First month
    Period('2024-01', 'M')

Weekly slicing:

>>> slicer = TimeSlicer(unit='week')
>>> unique_weeks = slicer.unique_slices(dates)
>>> len(unique_weeks)  # ~52 weeks in a year
    53

Source code in energy_repset/time_slicer.py

class TimeSlicer:
    """Convert a DatetimeIndex into labeled time slices.

    This class defines how the time index is divided into candidate periods
    for representative subset selection. It converts timestamps into Period
    objects or floored timestamps based on the specified temporal granularity.

    Args:
        unit: Temporal granularity of the slices. One of "year", "month",
            "week", "day", or "hour".

    Attributes:
        unit: The temporal granularity used for slicing.

    Note:
        The labels are hashable and suitable for set membership and grouping.
        Period objects are used for year, month, week, and day. Naive
        timestamps (floored to hour) are used for hourly slicing.

    Examples:
        Create a slicer for monthly periods:

        >>> import pandas as pd
        >>> from energy_repset.time_slicer import TimeSlicer
        >>>
        >>> dates = pd.date_range('2024-01-01', periods=8760, freq='h')
        >>> slicer = TimeSlicer(unit='month')
        >>> labels = slicer.labels_for_index(dates)
        >>> unique_months = slicer.unique_slices(dates)
        >>> len(unique_months)  # 12 months in a year
            12
        >>> unique_months[0]  # First month
            Period('2024-01', 'M')

        Weekly slicing:

        >>> slicer = TimeSlicer(unit='week')
        >>> unique_weeks = slicer.unique_slices(dates)
        >>> len(unique_weeks)  # ~52 weeks in a year
            53
    """

    def __init__(self, unit: SliceUnit) -> None:
        """Initialize TimeSlicer with specified temporal granularity.

        Args:
            unit: One of "year", "month", "week", "day", or "hour".
        """
        self.unit = unit

    def labels_for_index(self, index: pd.DatetimeIndex) -> pd.Index:
        """Return a vector of slice labels aligned to the given index.

        Args:
            index: DatetimeIndex for the input data.

        Returns:
            Index of slice labels matching the input index length. Each timestamp
            is mapped to its corresponding period or floored hour.

        Raises:
            ValueError: If unit is not one of the supported values.
        """
        if self.unit == "year":
            return index.to_period("Y")
        if self.unit == "month":
            return index.to_period("M")
        if self.unit == "week":
            return index.to_period("W")
        if self.unit == "day":
            return index.to_period("D")
        if self.unit == "hour":
            return pd.Index(index.floor("H"))
        raise ValueError("Unsupported unit")

    def unique_slices(self, index: pd.DatetimeIndex) -> List[Hashable]:
        """Return the sorted list of unique slice labels present in the index.

        Args:
            index: DatetimeIndex for the input data.

        Returns:
            Sorted list of unique slice labels. The sort order follows the natural
            ordering of Period objects or timestamps.
        """
        labels = self.labels_for_index(index)
        unique = pd.Index(labels).unique().tolist()
        unique.sort()
        return unique

    def get_indices_for_slice_combi(
        self,
        index: pd.DatetimeIndex,
        selection: Union[Hashable, SliceCombination],
    ) -> pd.Index:
        """Return the index positions for timestamps belonging to the given slice(s).

        Args:
            index: DatetimeIndex for the input data.
            selection: Either a single slice label or a tuple of slice labels
                (SliceCombination) to extract indices for.

        Returns:
            Index of timestamps that belong to the specified slice(s). If selection
            is a tuple, returns the union of all timestamps from all slices.

        Examples:
            Get indices for a single month:

            >>> import pandas as pd
            >>> from energy_repset.time_slicer import TimeSlicer
            >>>
            >>> dates = pd.date_range('2024-01-01', periods=8760, freq='h')
            >>> slicer = TimeSlicer(unit='month')
            >>> jan_slice = slicer.unique_slices(dates)[0]  # Period('2024-01', 'M')
            >>> jan_indices = slicer.get_indices_for_slice_combi(dates, jan_slice)
            >>> len(jan_indices)  # 744 hours in January 2024
                744

            Get indices for multiple months (selection):

            >>> selection = (Period('2024-01', 'M'), Period('2024-06', 'M'))
            >>> selected_indices = slicer.get_indices_for_slice_combi(dates, selection)
            >>> len(selected_indices)  # Jan (744) + Jun (720) = 1464
                1464
        """
        labels = self.labels_for_index(index)

        # Convert single slice to tuple for uniform handling
        if isinstance(selection, tuple):
            slice_set = set(selection)
        else:
            slice_set = {selection}

        # Create boolean mask for timestamps in any of the selected slices
        mask = labels.isin(slice_set)

        # Return the index positions where mask is True
        return index[mask]

init ¶

__init__(unit: SliceUnit) -> None

Initialize TimeSlicer with specified temporal granularity.

Parameters:

Name	Type	Description	Default
`unit`	`SliceUnit`	One of "year", "month", "week", "day", or "hour".	required

Source code in energy_repset/time_slicer.py

def __init__(self, unit: SliceUnit) -> None:
    """Initialize TimeSlicer with specified temporal granularity.

    Args:
        unit: One of "year", "month", "week", "day", or "hour".
    """
    self.unit = unit

labels_for_index ¶

labels_for_index(index: DatetimeIndex) -> Index

Return a vector of slice labels aligned to the given index.

Parameters:

Name	Type	Description	Default
`index`	`DatetimeIndex`	DatetimeIndex for the input data.	required

Returns:

Type	Description
`Index`	Index of slice labels matching the input index length. Each timestamp
`Index`	is mapped to its corresponding period or floored hour.

Raises:

Type	Description
`ValueError`	If unit is not one of the supported values.

Source code in energy_repset/time_slicer.py

def labels_for_index(self, index: pd.DatetimeIndex) -> pd.Index:
    """Return a vector of slice labels aligned to the given index.

    Args:
        index: DatetimeIndex for the input data.

    Returns:
        Index of slice labels matching the input index length. Each timestamp
        is mapped to its corresponding period or floored hour.

    Raises:
        ValueError: If unit is not one of the supported values.
    """
    if self.unit == "year":
        return index.to_period("Y")
    if self.unit == "month":
        return index.to_period("M")
    if self.unit == "week":
        return index.to_period("W")
    if self.unit == "day":
        return index.to_period("D")
    if self.unit == "hour":
        return pd.Index(index.floor("H"))
    raise ValueError("Unsupported unit")

unique_slices ¶

unique_slices(index: DatetimeIndex) -> list[Hashable]

Return the sorted list of unique slice labels present in the index.

Parameters:

Name	Type	Description	Default
`index`	`DatetimeIndex`	DatetimeIndex for the input data.	required

Returns:

Type	Description
`list[Hashable]`	Sorted list of unique slice labels. The sort order follows the natural
`list[Hashable]`	ordering of Period objects or timestamps.

Source code in energy_repset/time_slicer.py

def unique_slices(self, index: pd.DatetimeIndex) -> List[Hashable]:
    """Return the sorted list of unique slice labels present in the index.

    Args:
        index: DatetimeIndex for the input data.

    Returns:
        Sorted list of unique slice labels. The sort order follows the natural
        ordering of Period objects or timestamps.
    """
    labels = self.labels_for_index(index)
    unique = pd.Index(labels).unique().tolist()
    unique.sort()
    return unique

get_indices_for_slice_combi ¶

get_indices_for_slice_combi(index: DatetimeIndex, selection: Hashable | SliceCombination) -> Index

Return the index positions for timestamps belonging to the given slice(s).

Parameters:

Name	Type	Description	Default
`index`	`DatetimeIndex`	DatetimeIndex for the input data.	required
`selection`	`Hashable \| SliceCombination`	Either a single slice label or a tuple of slice labels (SliceCombination) to extract indices for.	required

Returns:

Type	Description
`Index`	Index of timestamps that belong to the specified slice(s). If selection
`Index`	is a tuple, returns the union of all timestamps from all slices.

Examples:

Get indices for a single month:

>>> import pandas as pd
>>> from energy_repset.time_slicer import TimeSlicer
>>>
>>> dates = pd.date_range('2024-01-01', periods=8760, freq='h')
>>> slicer = TimeSlicer(unit='month')
>>> jan_slice = slicer.unique_slices(dates)[0]  # Period('2024-01', 'M')
>>> jan_indices = slicer.get_indices_for_slice_combi(dates, jan_slice)
>>> len(jan_indices)  # 744 hours in January 2024
    744

Get indices for multiple months (selection):

>>> selection = (Period('2024-01', 'M'), Period('2024-06', 'M'))
>>> selected_indices = slicer.get_indices_for_slice_combi(dates, selection)
>>> len(selected_indices)  # Jan (744) + Jun (720) = 1464
    1464

Source code in energy_repset/time_slicer.py

def get_indices_for_slice_combi(
    self,
    index: pd.DatetimeIndex,
    selection: Union[Hashable, SliceCombination],
) -> pd.Index:
    """Return the index positions for timestamps belonging to the given slice(s).

    Args:
        index: DatetimeIndex for the input data.
        selection: Either a single slice label or a tuple of slice labels
            (SliceCombination) to extract indices for.

    Returns:
        Index of timestamps that belong to the specified slice(s). If selection
        is a tuple, returns the union of all timestamps from all slices.

    Examples:
        Get indices for a single month:

        >>> import pandas as pd
        >>> from energy_repset.time_slicer import TimeSlicer
        >>>
        >>> dates = pd.date_range('2024-01-01', periods=8760, freq='h')
        >>> slicer = TimeSlicer(unit='month')
        >>> jan_slice = slicer.unique_slices(dates)[0]  # Period('2024-01', 'M')
        >>> jan_indices = slicer.get_indices_for_slice_combi(dates, jan_slice)
        >>> len(jan_indices)  # 744 hours in January 2024
            744

        Get indices for multiple months (selection):

        >>> selection = (Period('2024-01', 'M'), Period('2024-06', 'M'))
        >>> selected_indices = slicer.get_indices_for_slice_combi(dates, selection)
        >>> len(selected_indices)  # Jan (744) + Jun (720) = 1464
            1464
    """
    labels = self.labels_for_index(index)

    # Convert single slice to tuple for uniform handling
    if isinstance(selection, tuple):
        slice_set = set(selection)
    else:
        slice_set = {selection}

    # Create boolean mask for timestamps in any of the selected slices
    mask = labels.isin(slice_set)

    # Return the index positions where mask is True
    return index[mask]

Context & Slicing¶

ProblemContext ¶

df_features property writable ¶

__init__ ¶

copy ¶

get_sliced_data ¶

get_unique_slices ¶

TimeSlicer ¶

__init__ ¶

labels_for_index ¶

unique_slices ¶

get_indices_for_slice_combi ¶

df_features `property` `writable` ¶

init ¶

init ¶