Context & Slicing¶
ProblemContext
¶
A data container passed through the entire workflow.
This class holds all data and metadata needed for representative subset selection. It is the central object passed between workflow stages (feature engineering, search algorithms, representation models).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_raw
|
DataFrame
|
Raw time-series data with datetime index and variable columns. |
required |
slicer
|
'TimeSlicer'
|
TimeSlicer defining how the time index is divided into candidate periods. |
required |
metadata
|
dict[str, Any] | None
|
Optional dict for storing arbitrary user data (e.g., default weights, experiment configuration, notes, etc.). Not used by the framework itself, but available for user convenience and custom component implementations. |
None
|
Examples:
Create a context with monthly slicing:
>>> import pandas as pd
>>> from energy_repset.context import ProblemContext
>>> from energy_repset.time_slicer import TimeSlicer
>>>
>>> # Create sample data
>>> dates = pd.date_range('2024-01-01', periods=8760, freq='h')
>>> df = pd.DataFrame({
... 'demand': np.random.rand(8760),
... 'solar': np.random.rand(8760)
... }, index=dates)
>>>
>>> # Create context with metadata
>>> slicer = TimeSlicer(unit='month')
>>> context = ProblemContext(
... df_raw=df,
... slicer=slicer,
... metadata={
... 'experiment_name': 'test_run_1',
... 'default_weights': {'demand': 1.5, 'solar': 1.0},
... 'notes': 'Testing seasonal selection'
... }
... )
>>> len(context.get_unique_slices()) # 12 months
12
>>> context.metadata['experiment_name'] # 'test_run_1'
>>>
>>> # Create context without metadata
>>> context2 = ProblemContext(df_raw=df, slicer=slicer)
>>> context2.metadata # {}
Source code in energy_repset/context.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |
df_features
property
writable
¶
df_features: DataFrame
Get the computed feature matrix.
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with slice labels as index and engineered features as columns. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If features have not been computed yet. Use a FeatureEngineer to populate this field first. |
__init__
¶
__init__(df_raw: DataFrame, slicer: 'TimeSlicer', metadata: dict[str, Any] | None = None)
Initialize a ProblemContext.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_raw
|
DataFrame
|
Raw time-series data with datetime index and variable columns. |
required |
slicer
|
'TimeSlicer'
|
TimeSlicer defining how the time index is divided into candidate periods. |
required |
metadata
|
dict[str, Any] | None
|
Optional dict for storing arbitrary user data. Not used by the framework itself. |
None
|
Source code in energy_repset/context.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | |
copy
¶
copy() -> 'ProblemContext'
Create a deep copy of this ProblemContext instance.
Returns:
| Type | Description |
|---|---|
'ProblemContext'
|
A new, independent instance of the context with all data copied. |
Source code in energy_repset/context.py
79 80 81 82 83 84 85 | |
get_sliced_data
¶
get_sliced_data() -> dict[Hashable, DataFrame]
Generate sliced raw data on demand.
Returns:
| Type | Description |
|---|---|
dict[Hashable, DataFrame]
|
Dictionary mapping slice labels to their corresponding DataFrame chunks. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
This method is not yet implemented. |
Source code in energy_repset/context.py
87 88 89 90 91 92 93 94 95 96 | |
get_unique_slices
¶
get_unique_slices() -> list[Hashable]
Get list of all unique slice labels from the time index.
Returns:
| Type | Description |
|---|---|
list[Hashable]
|
List of slice labels (e.g., Period objects for monthly slicing). |
Source code in energy_repset/context.py
141 142 143 144 145 146 147 | |
TimeSlicer
¶
Convert a DatetimeIndex into labeled time slices.
This class defines how the time index is divided into candidate periods for representative subset selection. It converts timestamps into Period objects or floored timestamps based on the specified temporal granularity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
unit
|
SliceUnit
|
Temporal granularity of the slices. One of "year", "month", "week", "day", or "hour". |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
unit |
The temporal granularity used for slicing. |
Note
The labels are hashable and suitable for set membership and grouping. Period objects are used for year, month, week, and day. Naive timestamps (floored to hour) are used for hourly slicing.
Examples:
Create a slicer for monthly periods:
>>> import pandas as pd
>>> from energy_repset.time_slicer import TimeSlicer
>>>
>>> dates = pd.date_range('2024-01-01', periods=8760, freq='h')
>>> slicer = TimeSlicer(unit='month')
>>> labels = slicer.labels_for_index(dates)
>>> unique_months = slicer.unique_slices(dates)
>>> len(unique_months) # 12 months in a year
12
>>> unique_months[0] # First month
Period('2024-01', 'M')
Weekly slicing:
>>> slicer = TimeSlicer(unit='week')
>>> unique_weeks = slicer.unique_slices(dates)
>>> len(unique_weeks) # ~52 weeks in a year
53
Source code in energy_repset/time_slicer.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | |
__init__
¶
__init__(unit: SliceUnit) -> None
Initialize TimeSlicer with specified temporal granularity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
unit
|
SliceUnit
|
One of "year", "month", "week", "day", or "hour". |
required |
Source code in energy_repset/time_slicer.py
49 50 51 52 53 54 55 | |
labels_for_index
¶
labels_for_index(index: DatetimeIndex) -> Index
Return a vector of slice labels aligned to the given index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index
|
DatetimeIndex
|
DatetimeIndex for the input data. |
required |
Returns:
| Type | Description |
|---|---|
Index
|
Index of slice labels matching the input index length. Each timestamp |
Index
|
is mapped to its corresponding period or floored hour. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If unit is not one of the supported values. |
Source code in energy_repset/time_slicer.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | |
unique_slices
¶
unique_slices(index: DatetimeIndex) -> list[Hashable]
Return the sorted list of unique slice labels present in the index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index
|
DatetimeIndex
|
DatetimeIndex for the input data. |
required |
Returns:
| Type | Description |
|---|---|
list[Hashable]
|
Sorted list of unique slice labels. The sort order follows the natural |
list[Hashable]
|
ordering of Period objects or timestamps. |
Source code in energy_repset/time_slicer.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 | |
get_indices_for_slice_combi
¶
get_indices_for_slice_combi(index: DatetimeIndex, selection: Hashable | SliceCombination) -> Index
Return the index positions for timestamps belonging to the given slice(s).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index
|
DatetimeIndex
|
DatetimeIndex for the input data. |
required |
selection
|
Hashable | SliceCombination
|
Either a single slice label or a tuple of slice labels (SliceCombination) to extract indices for. |
required |
Returns:
| Type | Description |
|---|---|
Index
|
Index of timestamps that belong to the specified slice(s). If selection |
Index
|
is a tuple, returns the union of all timestamps from all slices. |
Examples:
Get indices for a single month:
>>> import pandas as pd
>>> from energy_repset.time_slicer import TimeSlicer
>>>
>>> dates = pd.date_range('2024-01-01', periods=8760, freq='h')
>>> slicer = TimeSlicer(unit='month')
>>> jan_slice = slicer.unique_slices(dates)[0] # Period('2024-01', 'M')
>>> jan_indices = slicer.get_indices_for_slice_combi(dates, jan_slice)
>>> len(jan_indices) # 744 hours in January 2024
744
Get indices for multiple months (selection):
>>> selection = (Period('2024-01', 'M'), Period('2024-06', 'M'))
>>> selected_indices = slicer.get_indices_for_slice_combi(dates, selection)
>>> len(selected_indices) # Jan (744) + Jun (720) = 1464
1464
Source code in energy_repset/time_slicer.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | |