Diagnostics¶
Feature Space¶
FeatureSpaceScatter2D
¶
2D scatter plot for visualizing feature space.
Creates an interactive scatter plot of any two features from df_features. Can highlight a specific selection of slices. Works with any feature columns including PCA components ('pc_0', 'pc_1'), statistical features ('mean__wind'), or mixed features.
Examples:
>>> # Visualize PCA space
>>> scatter = FeatureSpaceScatter2D()
>>> fig = scatter.plot(context.df_features, x='pc_0', y='pc_1')
>>> fig.update_layout(title='PCA Feature Space')
>>> fig.show()
>>> # Visualize with selection highlighted
>>> fig = scatter.plot(
... context.df_features,
... x='mean__demand',
... y='pc_0',
... selection=('2024-01', '2024-04', '2024-07')
... )
>>> # Color by another feature
>>> fig = scatter.plot(
... context.df_features,
... x='pc_0',
... y='pc_1',
... color='std__wind'
... )
Source code in energy_repset/diagnostics/feature_space/feature_space_scatter.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |
__init__
¶
__init__()
Initialize the scatter plot diagnostic.
Source code in energy_repset/diagnostics/feature_space/feature_space_scatter.py
44 45 46 | |
plot
¶
plot(df_features: DataFrame, x: str, y: str, selection: SliceCombination = None, color: str = None) -> Figure
Create a 2D scatter plot of feature space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_features
|
DataFrame
|
Feature matrix with slices as rows, features as columns. |
required |
x
|
str
|
Column name for x-axis. |
required |
y
|
str
|
Column name for y-axis. |
required |
selection
|
SliceCombination
|
Optional tuple of slice identifiers to highlight. |
None
|
color
|
str
|
Optional column name to use for color mapping. |
None
|
Returns:
| Type | Description |
|---|---|
Figure
|
Plotly figure object ready for display or further customization. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If x, y, or color columns are not in df_features. |
Source code in energy_repset/diagnostics/feature_space/feature_space_scatter.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |
FeatureSpaceScatter3D
¶
3D scatter plot for visualizing feature space.
Creates an interactive 3D scatter plot of any three features from df_features. Can highlight a specific selection of slices. Works with any feature columns including PCA components or statistical features.
Examples:
>>> # Visualize 3D PCA space
>>> scatter = FeatureSpaceScatter3D()
>>> fig = scatter.plot(
... context.df_features,
... x='pc_0',
... y='pc_1',
... z='pc_2'
... )
>>> fig.update_layout(title='3D PCA Space')
>>> fig.show()
>>> # Highlight selection
>>> fig = scatter.plot(
... context.df_features,
... x='pc_0',
... y='pc_1',
... z='pc_2',
... selection=('2024-01', '2024-04')
... )
>>> # Color by feature value
>>> fig = scatter.plot(
... context.df_features,
... x='pc_0',
... y='pc_1',
... z='pc_2',
... color='mean__demand'
... )
Source code in energy_repset/diagnostics/feature_space/feature_space_scatter.py
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 | |
__init__
¶
__init__()
Initialize the 3D scatter plot diagnostic.
Source code in energy_repset/diagnostics/feature_space/feature_space_scatter.py
170 171 172 | |
plot
¶
plot(df_features: DataFrame, x: str, y: str, z: str, selection: SliceCombination = None, color: str = None) -> Figure
Create a 3D scatter plot of feature space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_features
|
DataFrame
|
Feature matrix with slices as rows, features as columns. |
required |
x
|
str
|
Column name for x-axis. |
required |
y
|
str
|
Column name for y-axis. |
required |
z
|
str
|
Column name for z-axis. |
required |
selection
|
SliceCombination
|
Optional tuple of slice identifiers to highlight. |
None
|
color
|
str
|
Optional column name to use for color mapping. |
None
|
Returns:
| Type | Description |
|---|---|
Figure
|
Plotly figure object ready for display or further customization. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If x, y, z, or color columns are not in df_features. |
Source code in energy_repset/diagnostics/feature_space/feature_space_scatter.py
174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 | |
FeatureSpaceScatterMatrix
¶
Scatter matrix (SPLOM) for visualizing relationships between multiple features.
Creates an interactive scatter plot matrix showing pairwise relationships between all specified features. Can highlight a specific selection of slices. Useful for exploring multi-dimensional feature spaces and identifying feature correlations.
Examples:
>>> # Visualize PCA components
>>> scatter_matrix = FeatureSpaceScatterMatrix()
>>> fig = scatter_matrix.plot(
... context.df_features,
... dimensions=['pc_0', 'pc_1', 'pc_2']
... )
>>> fig.update_layout(title='PCA Component Relationships')
>>> fig.show()
>>> # Visualize statistical features with selection
>>> fig = scatter_matrix.plot(
... context.df_features,
... dimensions=['mean__demand', 'std__demand', 'max__wind'],
... selection=('2024-01', '2024-04', '2024-07')
... )
>>> # Color by a feature value
>>> fig = scatter_matrix.plot(
... context.df_features,
... dimensions=['pc_0', 'pc_1', 'pc_2', 'pc_3'],
... color='mean__demand'
... )
>>> # All features
>>> fig = scatter_matrix.plot(context.df_features)
Source code in energy_repset/diagnostics/feature_space/feature_space_scatter_matrix.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | |
__init__
¶
__init__()
Initialize the scatter matrix diagnostic.
Source code in energy_repset/diagnostics/feature_space/feature_space_scatter_matrix.py
47 48 49 | |
plot
¶
plot(df_features: DataFrame, dimensions: list[str] = None, selection: SliceCombination = None, color: str = None) -> Figure
Create a scatter plot matrix of feature space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_features
|
DataFrame
|
Feature matrix with slices as rows, features as columns. |
required |
dimensions
|
list[str]
|
List of column names to include in the matrix. If None, uses all columns (may be slow for many features). |
None
|
selection
|
SliceCombination
|
Optional tuple of slice identifiers to highlight. |
None
|
color
|
str
|
Optional column name to use for color mapping. If None and selection is provided, colors by selection status. |
None
|
Returns:
| Type | Description |
|---|---|
Figure
|
Plotly figure object ready for display or further customization. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If any dimension or color column is not in df_features. |
ValueError
|
If dimensions list is empty. |
Source code in energy_repset/diagnostics/feature_space/feature_space_scatter_matrix.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | |
PCAVarianceExplained
¶
Visualize explained variance ratio for PCA components.
Creates a bar chart showing the proportion of variance explained by each principal component, along with cumulative variance. Helps determine how many components are needed to capture most of the data's variance.
This diagnostic requires the fitted PCAFeatureEngineer instance to access the explained_variance_ratio_ attribute.
Examples:
>>> # Get PCA engineer from pipeline
>>> pca_engineer = pipeline.engineers['pca']
>>> variance_plot = PCAVarianceExplained(pca_engineer)
>>> fig = variance_plot.plot()
>>> fig.update_layout(title='PCA Variance Explained')
>>> fig.show()
>>> # With custom number of components shown
>>> fig = variance_plot.plot(n_components=10)
>>> # After running workflow
>>> context_with_features = workflow.feature_engineer.run(context)
>>> pca_eng = workflow.feature_engineer.engineers['pca']
>>> variance_plot = PCAVarianceExplained(pca_eng)
>>> fig = variance_plot.plot()
Source code in energy_repset/diagnostics/feature_space/pca_variance_explained.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | |
__init__
¶
__init__(pca_engineer: PCAFeatureEngineer)
Initialize the PCA variance explained diagnostic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pca_engineer
|
PCAFeatureEngineer
|
A fitted PCAFeatureEngineer instance. Must have been fitted on data (i.e., calc_and_get_features_df has been called). |
required |
Source code in energy_repset/diagnostics/feature_space/pca_variance_explained.py
41 42 43 44 45 46 47 48 | |
plot
¶
plot(n_components: int = None, show_cumulative: bool = True) -> Figure
Create a bar chart of explained variance ratios.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_components
|
int
|
Number of components to show. If None, shows all components. |
None
|
show_cumulative
|
bool
|
If True, adds a line showing cumulative variance explained. |
True
|
Returns:
| Type | Description |
|---|---|
Figure
|
Plotly figure object ready for display or further customization. |
Raises:
| Type | Description |
|---|---|
AttributeError
|
If the PCA engineer has not been fitted yet. |
Source code in energy_repset/diagnostics/feature_space/pca_variance_explained.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | |
FeatureCorrelationHeatmap
¶
Visualize correlation matrix of features.
Creates an interactive heatmap showing Pearson correlations between all features in the feature matrix. Helps identify redundant features and understand feature relationships. Can optionally show only the lower triangle to avoid redundancy.
Examples:
>>> # Visualize all feature correlations
>>> heatmap = FeatureCorrelationHeatmap()
>>> fig = heatmap.plot(context.df_features)
>>> fig.update_layout(title='Feature Correlation Matrix')
>>> fig.show()
>>> # Show only lower triangle
>>> fig = heatmap.plot(context.df_features, show_lower_only=True)
>>> # Subset of features
>>> selected_features = context.df_features[['pc_0', 'pc_1', 'mean__demand']]
>>> fig = heatmap.plot(selected_features)
Source code in energy_repset/diagnostics/feature_space/feature_correlation_heatmap.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | |
__init__
¶
__init__()
Initialize the feature correlation heatmap diagnostic.
Source code in energy_repset/diagnostics/feature_space/feature_correlation_heatmap.py
31 32 33 | |
plot
¶
plot(df_features: DataFrame, method: str = 'pearson', show_lower_only: bool = False) -> Figure
Create a heatmap of feature correlations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_features
|
DataFrame
|
Feature matrix with slices as rows, features as columns. |
required |
method
|
str
|
Correlation method ('pearson', 'spearman', or 'kendall'). Default is 'pearson'. |
'pearson'
|
show_lower_only
|
bool
|
If True, shows only the lower triangle of the correlation matrix (removes redundant upper triangle and diagonal). |
False
|
Returns:
| Type | Description |
|---|---|
Figure
|
Plotly figure object ready for display or further customization. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If method is not one of the supported correlation methods. |
Source code in energy_repset/diagnostics/feature_space/feature_correlation_heatmap.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | |
FeatureDistributions
¶
Visualize distributions of all features as histograms.
Creates a grid of histograms showing the distribution of each feature across all slices. Helps identify feature scales, skewness, and potential outliers. Useful for understanding the feature space before selection.
Examples:
>>> # Visualize all feature distributions
>>> dist_plot = FeatureDistributions()
>>> fig = dist_plot.plot(context.df_features)
>>> fig.update_layout(title='Feature Distributions')
>>> fig.show()
>>> # Subset of features
>>> selected_features = context.df_features[['pc_0', 'pc_1', 'mean__demand']]
>>> fig = dist_plot.plot(selected_features)
>>> # With custom bin count
>>> fig = dist_plot.plot(context.df_features, nbins=30)
Source code in energy_repset/diagnostics/feature_space/feature_distributions.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | |
__init__
¶
__init__()
Initialize the feature distributions diagnostic.
Source code in energy_repset/diagnostics/feature_space/feature_distributions.py
31 32 33 | |
plot
¶
plot(df_features: DataFrame, nbins: int = 20, cols: int = 3) -> Figure
Create a grid of histograms for all features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_features
|
DataFrame
|
Feature matrix with slices as rows, features as columns. |
required |
nbins
|
int
|
Number of bins for each histogram. Default is 20. |
20
|
cols
|
int
|
Number of columns in the subplot grid. Default is 3. |
3
|
Returns:
| Type | Description |
|---|---|
Figure
|
Plotly figure object ready for display or further customization. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If df_features is empty or nbins/cols are invalid. |
Source code in energy_repset/diagnostics/feature_space/feature_distributions.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | |
Score Components¶
DistributionOverlayECDF
¶
Overlay empirical cumulative distribution functions (ECDF) to compare distributions.
Creates a plot showing the ECDF of a variable for both the full dataset and a selection. This helps visualize how well the selection represents the full distribution, which is what WassersteinFidelity measures.
Examples:
>>> # Compare demand distribution
>>> ecdf_plot = DistributionOverlayECDF()
>>> full_data = context.df_raw['demand']
>>> selected_indices = context.slicer.get_indices_for_slices(result.selection)
>>> selected_data = context.df_raw.loc[selected_indices, 'demand']
>>> fig = ecdf_plot.plot(full_data, selected_data)
>>> fig.update_layout(title='Demand Distribution: Full vs Selected')
>>> fig.show()
>>> # Alternative: using iloc
>>> selection_mask = context.df_raw.index.isin(selected_indices)
>>> fig = ecdf_plot.plot(
... context.df_raw['wind'],
... context.df_raw.loc[selection_mask, 'wind']
... )
Source code in energy_repset/diagnostics/score_components/distribution_overlay.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | |
__init__
¶
__init__()
Initialize the ECDF overlay diagnostic.
Source code in energy_repset/diagnostics/score_components/distribution_overlay.py
35 36 37 | |
plot
¶
plot(df_full: Series, df_selection: Series, full_label: str = 'Full', selection_label: str = 'Selection') -> Figure
Create an ECDF overlay plot.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_full
|
Series
|
Series containing values for the full dataset. |
required |
df_selection
|
Series
|
Series containing values for the selection. |
required |
full_label
|
str
|
Label for the full dataset in the legend. Default 'Full'. |
'Full'
|
selection_label
|
str
|
Label for the selection in the legend. Default 'Selection'. |
'Selection'
|
Returns:
| Type | Description |
|---|---|
Figure
|
Plotly figure object ready for display or further customization. |
Source code in energy_repset/diagnostics/score_components/distribution_overlay.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | |
DistributionOverlayHistogram
¶
Overlay histograms to compare distributions.
Creates a plot showing normalized histograms of a variable for both the full dataset and a selection. Alternative to ECDF that may be more intuitive for some users. Shows probability density rather than cumulative probability.
Examples:
>>> # Compare demand distribution
>>> hist_plot = DistributionOverlayHistogram()
>>> full_data = context.df_raw['demand']
>>> selected_indices = context.slicer.get_indices_for_slices(result.selection)
>>> selected_data = context.df_raw.loc[selected_indices, 'demand']
>>> fig = hist_plot.plot(full_data, selected_data)
>>> fig.update_layout(title='Demand Distribution: Full vs Selected')
>>> fig.show()
>>> # With custom bin count
>>> fig = hist_plot.plot(full_data, selected_data, nbins=50)
>>> # Using density mode
>>> fig = hist_plot.plot(full_data, selected_data, histnorm='probability density')
Source code in energy_repset/diagnostics/score_components/distribution_overlay.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 | |
__init__
¶
__init__()
Initialize the histogram overlay diagnostic.
Source code in energy_repset/diagnostics/score_components/distribution_overlay.py
126 127 128 | |
plot
¶
plot(df_full: Series, df_selection: Series, nbins: int = 30, histnorm: str = 'probability', full_label: str = 'Full', selection_label: str = 'Selection') -> Figure
Create a histogram overlay plot.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_full
|
Series
|
Series containing values for the full dataset. |
required |
df_selection
|
Series
|
Series containing values for the selection. |
required |
nbins
|
int
|
Number of bins for the histogram. Default is 30. |
30
|
histnorm
|
str
|
Histogram normalization mode. Options: 'probability', 'probability density', 'percent'. Default is 'probability'. |
'probability'
|
full_label
|
str
|
Label for the full dataset in the legend. Default 'Full'. |
'Full'
|
selection_label
|
str
|
Label for the selection in the legend. Default 'Selection'. |
'Selection'
|
Returns:
| Type | Description |
|---|---|
Figure
|
Plotly figure object ready for display or further customization. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If histnorm is not a valid option. |
Source code in energy_repset/diagnostics/score_components/distribution_overlay.py
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 | |
CorrelationDifferenceHeatmap
¶
Visualize the difference between correlation matrices.
Creates a heatmap showing the difference between the correlation matrix of the full dataset and the selection. This helps identify which variable relationships are well-preserved or poorly-preserved by the selection. Related to CorrelationFidelity score component.
Positive values (red) indicate the selection has stronger correlation than the full dataset. Negative values (blue) indicate weaker correlation.
Examples:
>>> # Compare correlation structure
>>> corr_diff = CorrelationDifferenceHeatmap()
>>> full_data = context.df_raw[['demand', 'wind', 'solar']]
>>> selected_indices = context.slicer.get_indices_for_slices(result.selection)
>>> selected_data = context.df_raw.loc[selected_indices, ['demand', 'wind', 'solar']]
>>> fig = corr_diff.plot(full_data, selected_data)
>>> fig.update_layout(title='Correlation Difference: Selection - Full')
>>> fig.show()
>>> # With Spearman correlation
>>> fig = corr_diff.plot(full_data, selected_data, method='spearman')
>>> # Show only lower triangle
>>> fig = corr_diff.plot(full_data, selected_data, show_lower_only=True)
Source code in energy_repset/diagnostics/score_components/correlation_difference_heatmap.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |
__init__
¶
__init__()
Initialize the correlation difference heatmap diagnostic.
Source code in energy_repset/diagnostics/score_components/correlation_difference_heatmap.py
37 38 39 | |
plot
¶
plot(df_full: DataFrame, df_selection: DataFrame, method: str = 'pearson', show_lower_only: bool = False) -> Figure
Create a heatmap of correlation differences.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_full
|
DataFrame
|
DataFrame containing variables for the full dataset. |
required |
df_selection
|
DataFrame
|
DataFrame containing variables for the selection. Must have the same columns as df_full. |
required |
method
|
str
|
Correlation method ('pearson', 'spearman', or 'kendall'). Default is 'pearson'. |
'pearson'
|
show_lower_only
|
bool
|
If True, shows only the lower triangle of the difference matrix (removes redundant upper triangle and diagonal). |
False
|
Returns:
| Type | Description |
|---|---|
Figure
|
Plotly figure object ready for display or further customization. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If method is invalid or columns don't match. |
Source code in energy_repset/diagnostics/score_components/correlation_difference_heatmap.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |
DiurnalProfileOverlay
¶
Overlay mean diurnal (hour-of-day) profiles for full vs selected data.
Creates a plot showing the average value by hour of day for each variable, comparing the full dataset to the selection. This helps visualize how well the selection preserves daily patterns, which is related to DiurnalFidelity score component.
Examples:
>>> # Compare diurnal patterns
>>> diurnal_plot = DiurnalProfileOverlay()
>>> full_data = context.df_raw[['demand', 'wind', 'solar']]
>>> selected_indices = context.slicer.get_indices_for_slices(result.selection)
>>> selected_data = context.df_raw.loc[selected_indices, ['demand', 'wind', 'solar']]
>>> fig = diurnal_plot.plot(full_data, selected_data)
>>> fig.update_layout(title='Diurnal Profiles: Full vs Selected')
>>> fig.show()
>>> # Single variable
>>> fig = diurnal_plot.plot(
... full_data[['demand']],
... selected_data[['demand']]
... )
>>> # Subset of variables
>>> fig = diurnal_plot.plot(
... full_data,
... selected_data,
... variables=['demand', 'wind']
... )
Source code in energy_repset/diagnostics/score_components/diurnal_profile_overlay.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
__init__
¶
__init__()
Initialize the diurnal profile overlay diagnostic.
Source code in energy_repset/diagnostics/score_components/diurnal_profile_overlay.py
40 41 42 | |
plot
¶
plot(df_full: DataFrame, df_selection: DataFrame, variables: list[str] = None, full_label: str = 'Full', selection_label: str = 'Selection') -> Figure
Create a diurnal profile overlay plot.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_full
|
DataFrame
|
DataFrame with DatetimeIndex and variable columns for full dataset. |
required |
df_selection
|
DataFrame
|
DataFrame with DatetimeIndex and variable columns for selection. Must have the same columns as df_full. |
required |
variables
|
list[str]
|
List of variable names to include. If None, uses all columns. |
None
|
full_label
|
str
|
Label suffix for full dataset traces. Default 'Full'. |
'Full'
|
selection_label
|
str
|
Label suffix for selection traces. Default 'Selection'. |
'Selection'
|
Returns:
| Type | Description |
|---|---|
Figure
|
Plotly figure object ready for display or further customization. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If DataFrames don't have DatetimeIndex or columns don't match. |
Source code in energy_repset/diagnostics/score_components/diurnal_profile_overlay.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
Results¶
ResponsibilityBars
¶
Bar chart showing responsibility weights for selected representatives.
Visualizes the weight distribution across selected periods as computed by a RepresentationModel. Each bar shows how much each representative contributes to the full dataset representation.
Optionally displays a reference line showing uniform weights (1/k) for comparison with non-uniform weighting schemes like cluster-size based weights.
Examples:
>>> from energy_repset.diagnostics.results import ResponsibilityBars
>>>
>>> # After running workflow with result containing weights
>>> weights = result.weights # e.g., {Period('2024-01'): 0.35, ...}
>>> bars = ResponsibilityBars()
>>> fig = bars.plot(weights, show_uniform_reference=True)
>>> fig.update_layout(title='Responsibility Weights')
>>> fig.show()
Source code in energy_repset/diagnostics/results/responsibility_bars.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | |
__init__
¶
__init__()
Initialize ResponsibilityBars diagnostic.
Source code in energy_repset/diagnostics/results/responsibility_bars.py
33 34 35 | |
plot
¶
plot(weights: dict[Hashable, float], show_uniform_reference: bool = True) -> Figure
Create bar chart of responsibility weights.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
weights
|
dict[Hashable, float]
|
Dictionary mapping slice identifiers to their weights. Weights should sum to 1.0 for meaningful comparison with the uniform reference line. |
required |
show_uniform_reference
|
bool
|
If True, adds horizontal dashed line showing uniform weight (1/k) for comparison. |
True
|
Returns:
| Type | Description |
|---|---|
Figure
|
Plotly figure with bar chart. X-axis shows slice labels, Y-axis |
Figure
|
shows weight values. Text labels show weights to 3 decimal places. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If weights dictionary is empty. |
Source code in energy_repset/diagnostics/results/responsibility_bars.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | |
ParetoScatter2D
¶
2D scatter plot of all evaluated combinations with Pareto front highlighted.
Visualizes the objective space for two objectives, showing: - All evaluated combinations as scatter points - Pareto-optimal solutions highlighted - Selected combination (if provided) marked distinctly - Feasible vs infeasible solutions (if constraints exist)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
objective_x
|
str
|
Name of objective for x-axis. |
required |
objective_y
|
str
|
Name of objective for y-axis. |
required |
Examples:
>>> from energy_repset.diagnostics.results import ParetoScatter2D
>>> scatter = ParetoScatter2D(objective_x='wasserstein', objective_y='correlation')
>>> fig = scatter.plot(
... search_algorithm=workflow.search_algorithm,
... selected_combination=result.selection
... )
>>> fig.update_layout(title='Pareto Front: Wasserstein vs Correlation')
>>> fig.show()
Source code in energy_repset/diagnostics/results/pareto_scatter.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 | |
__init__
¶
__init__(objective_x: str, objective_y: str)
Initialize Pareto scatter diagnostic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
objective_x
|
str
|
Name of objective for x-axis. |
required |
objective_y
|
str
|
Name of objective for y-axis. |
required |
Source code in energy_repset/diagnostics/results/pareto_scatter.py
37 38 39 40 41 42 43 44 45 | |
plot
¶
plot(search_algorithm: ObjectiveDrivenCombinatorialSearchAlgorithm, selected_combination: SliceCombination | None = None) -> Figure
Create 2D scatter plot of Pareto front.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search_algorithm
|
ObjectiveDrivenCombinatorialSearchAlgorithm
|
Search algorithm after find_selection() has been called. |
required |
selected_combination
|
SliceCombination | None
|
Optional combination to highlight (e.g., result.selection). |
None
|
Returns:
| Type | Description |
|---|---|
Figure
|
Plotly figure with scatter plot. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If find_selection() hasn't been called or objectives not found. |
Source code in energy_repset/diagnostics/results/pareto_scatter.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 | |
ParetoScatterMatrix
¶
Scatter matrix of all objectives showing Pareto front.
Creates a scatter plot matrix (SPLOM) showing pairwise relationships between all objectives. Each subplot shows two objectives with Pareto front highlighted.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
objectives
|
list[str] | None
|
List of objective names to include (None = all objectives). |
None
|
Examples:
>>> from energy_repset.diagnostics.results import ParetoScatterMatrix
>>> scatter_matrix = ParetoScatterMatrix(
... objectives=['wasserstein', 'correlation', 'diurnal']
... )
>>> fig = scatter_matrix.plot(
... search_algorithm=workflow.search_algorithm,
... selected_combination=result.selection
... )
>>> fig.update_layout(title='Pareto Front: All Objectives')
>>> fig.show()
Source code in energy_repset/diagnostics/results/pareto_scatter.py
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 | |
__init__
¶
__init__(objectives: list[str] | None = None)
Initialize Pareto scatter matrix diagnostic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
objectives
|
list[str] | None
|
List of objective names to include (None = all). |
None
|
Source code in energy_repset/diagnostics/results/pareto_scatter.py
201 202 203 204 205 206 207 | |
plot
¶
plot(search_algorithm: ObjectiveDrivenCombinatorialSearchAlgorithm, selected_combination: SliceCombination | None = None) -> Figure
Create scatter matrix of Pareto front.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search_algorithm
|
ObjectiveDrivenCombinatorialSearchAlgorithm
|
Search algorithm after find_selection() has been called. |
required |
selected_combination
|
SliceCombination | None
|
Optional combination to highlight. |
None
|
Returns:
| Type | Description |
|---|---|
Figure
|
Plotly figure with scatter matrix. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If find_selection() hasn't been called. |
Source code in energy_repset/diagnostics/results/pareto_scatter.py
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 | |
ParetoParallelCoordinates
¶
Parallel coordinates plot of Pareto front.
Visualizes multi-objective trade-offs using parallel coordinates where each vertical axis represents one objective. Lines connecting axes show individual solutions, with Pareto-optimal solutions highlighted.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
objectives
|
list[str] | None
|
List of objective names to include (None = all objectives). |
None
|
Examples:
>>> from energy_repset.diagnostics.results import ParetoParallelCoordinates
>>> parallel = ParetoParallelCoordinates()
>>> fig = parallel.plot(search_algorithm=workflow.search_algorithm)
>>> fig.update_layout(title='Pareto Front: Parallel Coordinates')
>>> fig.show()
Source code in energy_repset/diagnostics/results/pareto_parallel_coords.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | |
__init__
¶
__init__(objectives: list[str] | None = None)
Initialize parallel coordinates diagnostic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
objectives
|
list[str] | None
|
List of objective names to include (None = all). |
None
|
Source code in energy_repset/diagnostics/results/pareto_parallel_coords.py
29 30 31 32 33 34 35 | |
plot
¶
plot(search_algorithm: ObjectiveDrivenCombinatorialSearchAlgorithm) -> Figure
Create parallel coordinates plot of Pareto front.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search_algorithm
|
ObjectiveDrivenCombinatorialSearchAlgorithm
|
Search algorithm after find_selection() has been called. |
required |
Returns:
| Type | Description |
|---|---|
Figure
|
Plotly figure with parallel coordinates plot. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If find_selection() hasn't been called. |
Source code in energy_repset/diagnostics/results/pareto_parallel_coords.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | |
ScoreContributionBars
¶
Bar chart showing final scores from each objective component.
Visualizes the contribution of each score component to understand which objectives were most influential in the final selection. Can display absolute scores or normalized as fractions of total.
Examples:
>>> from energy_repset.diagnostics.results import ScoreContributionBars
>>> contrib = ScoreContributionBars()
>>> fig = contrib.plot(result.scores, normalize=True)
>>> fig.update_layout(title='Score Component Contributions')
>>> fig.show()
Source code in energy_repset/diagnostics/results/score_contribution_bars.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | |
plot
¶
plot(scores: dict[str, float], normalize: bool = False) -> Figure
Create bar chart of score component contributions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scores
|
dict[str, float]
|
Dictionary of scores from each component (from result.scores). |
required |
normalize
|
bool
|
If True, show as fractions of total score. |
False
|
Returns:
| Type | Description |
|---|---|
Figure
|
Plotly figure with bar chart. |
Source code in energy_repset/diagnostics/results/score_contribution_bars.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | |