Selection Policies¶

SelectionPolicy ¶

Bases: ABC

Base class for selection policies that choose the best combination.

Selection policies define the strategy for choosing the winning combination from a set of scored candidates. Different policies implement different trade-offs between competing objectives (e.g., weighted sum vs. Pareto).

This is a key component of the Generate-and-Test workflow where the SearchAlgorithm generates candidates, the ObjectiveSet scores them, and the SelectionPolicy picks the winner.

Examples:

>>> # See WeightedSumPolicy and ParetoUtopiaPolicy for concrete examples
>>> class SimpleMinPolicy(SelectionPolicy):
...     def select_best(self, evaluations_df: pd.DataFrame, objective_set: ObjectiveSet):
...         # Just pick the row with minimum of first objective
...         first_obj = list(objective_set.component_meta().keys())[0]
...         best_row = evaluations_df.loc[evaluations_df[first_obj].idxmin()]
...         return tuple(best_row['slices'])

Source code in energy_repset/selection_policies/policy.py

class SelectionPolicy(ABC):
    """Base class for selection policies that choose the best combination.

    Selection policies define the strategy for choosing the winning combination
    from a set of scored candidates. Different policies implement different
    trade-offs between competing objectives (e.g., weighted sum vs. Pareto).

    This is a key component of the Generate-and-Test workflow where the
    SearchAlgorithm generates candidates, the ObjectiveSet scores them, and
    the SelectionPolicy picks the winner.

    Examples:
        >>> # See WeightedSumPolicy and ParetoUtopiaPolicy for concrete examples
        >>> class SimpleMinPolicy(SelectionPolicy):
        ...     def select_best(self, evaluations_df: pd.DataFrame, objective_set: ObjectiveSet):
        ...         # Just pick the row with minimum of first objective
        ...         first_obj = list(objective_set.component_meta().keys())[0]
        ...         best_row = evaluations_df.loc[evaluations_df[first_obj].idxmin()]
        ...         return tuple(best_row['slices'])
    """

    @abstractmethod
    def select_best(self, evaluations_df: pd.DataFrame, objective_set: ObjectiveSet) -> Tuple[Hashable, ...]:
        """Select the best combination from scored candidates.

        Args:
            evaluations_df: DataFrame where each row is a candidate combination
                with columns 'slices' (the combination tuple) and score columns
                for each objective component.
            objective_set: Provides metadata about score components (direction,
                weights, etc.) needed for selection logic.

        Returns:
            Tuple of slice identifiers representing the winning combination.
        """
        ...

select_best `abstractmethod` ¶

select_best(evaluations_df: DataFrame, objective_set: ObjectiveSet) -> tuple[Hashable, ...]

Select the best combination from scored candidates.

Parameters:

Name	Type	Description	Default
`evaluations_df`	`DataFrame`	DataFrame where each row is a candidate combination with columns 'slices' (the combination tuple) and score columns for each objective component.	required
`objective_set`	`ObjectiveSet`	Provides metadata about score components (direction, weights, etc.) needed for selection logic.	required

Returns:

Type	Description
`tuple[Hashable, ...]`	Tuple of slice identifiers representing the winning combination.

Source code in energy_repset/selection_policies/policy.py

@abstractmethod
def select_best(self, evaluations_df: pd.DataFrame, objective_set: ObjectiveSet) -> Tuple[Hashable, ...]:
    """Select the best combination from scored candidates.

    Args:
        evaluations_df: DataFrame where each row is a candidate combination
            with columns 'slices' (the combination tuple) and score columns
            for each objective component.
        objective_set: Provides metadata about score components (direction,
            weights, etc.) needed for selection logic.

    Returns:
        Tuple of slice identifiers representing the winning combination.
    """
    ...

PolicyOutcome `dataclass` ¶

Source code in energy_repset/selection_policies/policy.py

@dataclass(frozen=True)
class PolicyOutcome:
    algorithm: SearchAlgorithm
    selected: RepSetResult
    scores_annotated: pd.DataFrame

WeightedSumPolicy ¶

Bases: SelectionPolicy

Selects the combination minimizing a weighted sum of objectives.

Combines multiple objectives into a single scalar score using weighted averaging. Objectives are oriented for minimization (max objectives are negated), optionally normalized, then combined using weights from the ObjectiveSet (which can be overridden).

This is the simplest multi-objective selection strategy and works well when relative importance of objectives is known.

Examples:

>>> from energy_repset import ObjectiveSet, ObjectiveSpec
>>> from energy_repset.score_components import WassersteinFidelity, CorrelationFidelity
>>> # Default: use weights from ObjectiveSet
>>> policy = WeightedSumPolicy()
>>> objectives = ObjectiveSet([
...     ObjectiveSpec('wasserstein', WassersteinFidelity(), weight=1.0),
...     ObjectiveSpec('correlation', CorrelationFidelity(), weight=0.5)
... ])
>>> # Final score = 1.0*wasserstein + 0.5*correlation

>>> # Override weights in policy
>>> policy = WeightedSumPolicy(
...     overrides={'wasserstein': 2.0, 'correlation': 1.0}
... )
>>> # Final score = 2.0*wasserstein + 1.0*correlation

>>> # With normalization to make objectives comparable
>>> policy = WeightedSumPolicy(
...     normalization='robust_minmax',  # Scale to [0, 1] using 5th-95th percentiles
...     tie_breakers=('wasserstein',),  # Break ties by wasserstein
...     tie_dirs=('min',)
... )

Source code in energy_repset/selection_policies/weighted_sum.py

class WeightedSumPolicy(SelectionPolicy):
    """Selects the combination minimizing a weighted sum of objectives.

    Combines multiple objectives into a single scalar score using weighted
    averaging. Objectives are oriented for minimization (max objectives are
    negated), optionally normalized, then combined using weights from the
    ObjectiveSet (which can be overridden).

    This is the simplest multi-objective selection strategy and works well
    when relative importance of objectives is known.

    Examples:
        >>> from energy_repset import ObjectiveSet, ObjectiveSpec
        >>> from energy_repset.score_components import WassersteinFidelity, CorrelationFidelity
        >>> # Default: use weights from ObjectiveSet
        >>> policy = WeightedSumPolicy()
        >>> objectives = ObjectiveSet([
        ...     ObjectiveSpec('wasserstein', WassersteinFidelity(), weight=1.0),
        ...     ObjectiveSpec('correlation', CorrelationFidelity(), weight=0.5)
        ... ])
        >>> # Final score = 1.0*wasserstein + 0.5*correlation

        >>> # Override weights in policy
        >>> policy = WeightedSumPolicy(
        ...     overrides={'wasserstein': 2.0, 'correlation': 1.0}
        ... )
        >>> # Final score = 2.0*wasserstein + 1.0*correlation

        >>> # With normalization to make objectives comparable
        >>> policy = WeightedSumPolicy(
        ...     normalization='robust_minmax',  # Scale to [0, 1] using 5th-95th percentiles
        ...     tie_breakers=('wasserstein',),  # Break ties by wasserstein
        ...     tie_dirs=('min',)
        ... )
    """
    def __init__(
            self,
            overrides: Optional[Dict[str, float]] = None,
            normalization: Normalization = "none",
            tie_breakers: Tuple[str, ...] = (),
            tie_dirs: Tuple[ScoreComponentDirection, ...] = (),
    ) -> None:
        """Initialize weighted sum policy.

        Args:
            overrides: Optional dict mapping objective names to weights,
                overriding weights from ObjectiveSet.
            normalization: How to normalize objectives before weighting:
                - "none": No normalization
                - "robust_minmax": Scale to [0, 1] using 5th-95th percentiles
                - "zscore_iqr": Z-score using median and IQR
            tie_breakers: Tuple of objective names to use for tie-breaking.
            tie_dirs: Corresponding directions ("min" or "max") for tie-breakers.
        """
        self.overrides = overrides or {}
        self.normalization = normalization
        self.tie_breakers = tie_breakers
        self.tie_dirs = tie_dirs

    def select_best(self, evaluations_df: pd.DataFrame, objective_set: ObjectiveSet) -> Tuple[Hashable, ...]:
        """Select combination with minimum weighted sum score.

        Args:
            evaluations_df: DataFrame with 'slices' column and objective scores.
            objective_set: Provides component metadata (direction, weights).

        Returns:
            Tuple of slice identifiers with the lowest weighted sum score.
        """
        df = evaluations_df.copy()
        meta = objective_set.component_meta()
        oriented = df[list(meta.keys())].copy()

        # Orient all objectives for minimization
        for name, m in meta.items():
            if m["direction"] == "max":
                oriented[name] = -oriented[name]

        # Normalize if requested
        Z = self._normalize(oriented, mode=self.normalization)

        # Compute weights (preferences from ObjectiveSet, overrides from strategy)
        weights = {name: float(m["pref"]) for name, m in meta.items()}
        for k, v in self.overrides.items():
            if k not in weights:
                raise ValueError(f"Unknown metric in overrides: {k}")
            weights[k] = float(v)

        # Compute weighted sum scores
        df["strategy_score"] = sum(Z[name] * w for name, w in weights.items())

        # Find best solution
        best = df.sort_values("strategy_score", ascending=True)
        if len(best) > 1 and len(self.tie_breakers) > 0:
            for col, d in zip(self.tie_breakers, self.tie_dirs):
                best = best.sort_values(col, ascending=(d == "min"))

        return tuple(best.iloc[0]["slices"])

    def _normalize(self, Y: pd.DataFrame, mode: Normalization) -> pd.DataFrame:
        if mode == "none":
            return Y
        if mode == "robust_minmax":
            q_lo = Y.quantile(0.05)
            q_hi = Y.quantile(0.95)
            denom = (q_hi - q_lo).replace(0, 1.0)
            return ((Y - q_lo) / denom).clip(lower=0.0)
        med = Y.median()
        iqr = (Y.quantile(0.75) - Y.quantile(0.25)).replace(0, 1.0)
        return (Y - med) / iqr

init ¶

__init__(overrides: dict[str, float] | None = None, normalization: Normalization = 'none', tie_breakers: tuple[str, ...] = (), tie_dirs: tuple[ScoreComponentDirection, ...] = ()) -> None

Initialize weighted sum policy.

Parameters:

Name	Type	Description	Default
`overrides`	`dict[str, float] \| None`	Optional dict mapping objective names to weights, overriding weights from ObjectiveSet.	`None`
`normalization`	`Normalization`	How to normalize objectives before weighting: - "none": No normalization - "robust_minmax": Scale to [0, 1] using 5th-95th percentiles - "zscore_iqr": Z-score using median and IQR	`'none'`
`tie_breakers`	`tuple[str, ...]`	Tuple of objective names to use for tie-breaking.	`()`
`tie_dirs`	`tuple[ScoreComponentDirection, ...]`	Corresponding directions ("min" or "max") for tie-breakers.	`()`

Source code in energy_repset/selection_policies/weighted_sum.py

def __init__(
        self,
        overrides: Optional[Dict[str, float]] = None,
        normalization: Normalization = "none",
        tie_breakers: Tuple[str, ...] = (),
        tie_dirs: Tuple[ScoreComponentDirection, ...] = (),
) -> None:
    """Initialize weighted sum policy.

    Args:
        overrides: Optional dict mapping objective names to weights,
            overriding weights from ObjectiveSet.
        normalization: How to normalize objectives before weighting:
            - "none": No normalization
            - "robust_minmax": Scale to [0, 1] using 5th-95th percentiles
            - "zscore_iqr": Z-score using median and IQR
        tie_breakers: Tuple of objective names to use for tie-breaking.
        tie_dirs: Corresponding directions ("min" or "max") for tie-breakers.
    """
    self.overrides = overrides or {}
    self.normalization = normalization
    self.tie_breakers = tie_breakers
    self.tie_dirs = tie_dirs

select_best ¶

select_best(evaluations_df: DataFrame, objective_set: ObjectiveSet) -> tuple[Hashable, ...]

Select combination with minimum weighted sum score.

Parameters:

Name	Type	Description	Default
`evaluations_df`	`DataFrame`	DataFrame with 'slices' column and objective scores.	required
`objective_set`	`ObjectiveSet`	Provides component metadata (direction, weights).	required

Returns:

Type	Description
`tuple[Hashable, ...]`	Tuple of slice identifiers with the lowest weighted sum score.

Source code in energy_repset/selection_policies/weighted_sum.py

def select_best(self, evaluations_df: pd.DataFrame, objective_set: ObjectiveSet) -> Tuple[Hashable, ...]:
    """Select combination with minimum weighted sum score.

    Args:
        evaluations_df: DataFrame with 'slices' column and objective scores.
        objective_set: Provides component metadata (direction, weights).

    Returns:
        Tuple of slice identifiers with the lowest weighted sum score.
    """
    df = evaluations_df.copy()
    meta = objective_set.component_meta()
    oriented = df[list(meta.keys())].copy()

    # Orient all objectives for minimization
    for name, m in meta.items():
        if m["direction"] == "max":
            oriented[name] = -oriented[name]

    # Normalize if requested
    Z = self._normalize(oriented, mode=self.normalization)

    # Compute weights (preferences from ObjectiveSet, overrides from strategy)
    weights = {name: float(m["pref"]) for name, m in meta.items()}
    for k, v in self.overrides.items():
        if k not in weights:
            raise ValueError(f"Unknown metric in overrides: {k}")
        weights[k] = float(v)

    # Compute weighted sum scores
    df["strategy_score"] = sum(Z[name] * w for name, w in weights.items())

    # Find best solution
    best = df.sort_values("strategy_score", ascending=True)
    if len(best) > 1 and len(self.tie_breakers) > 0:
        for col, d in zip(self.tie_breakers, self.tie_dirs):
            best = best.sort_values(col, ascending=(d == "min"))

    return tuple(best.iloc[0]["slices"])