Skip to content

Meta Models

sklego.meta.confusion_balancer.ConfusionBalancer

Bases: BaseEstimator, MetaEstimatorMixin, ClassifierMixin

The ConfusionBalancer estimator attempts to give it's child estimator a more balanced output by learning from the confusion matrix during training.

The idea is that the confusion matrix calculates \(P(C_i | M_i)\) where \(C_i\) is the actual class and \(M_i\) is the class that the underlying model gives. We use these probabilities to attempt a more balanced prediction by averaging the correction from the confusion matrix with the original probabilities.

\[P(\text{class}_j) = \alpha P(\text{model}_j) + (1-\alpha) P(\text{class}_j | \text{model}_j) P(\text{model}_j)\]

Parameters:

Name Type Description Default
estimator scikit-learn compatible classifier

The estimator to be wrapped, it must implement a predict_proba method.

required
alpha float

Hyperparameter which determines how much smoothing to apply. Must be between 0 and 1.

0.5
cfm_smooth float

Smoothing parameter for the confusion matrices to ensure zeros don't exist.

0

Attributes:

Name Type Description
classes_ array-like of shape (n_classes,)

The target class labels.

cfm_ array-like of shape (n_classes, n_classes)

The confusion matrix used for the correction.

Source code in sklego/meta/confusion_balancer.py
class ConfusionBalancer(BaseEstimator, MetaEstimatorMixin, ClassifierMixin):
    r"""The `ConfusionBalancer` estimator attempts to give it's child estimator a more balanced output by learning from
    the confusion matrix during training.

    The idea is that the confusion matrix calculates $P(C_i | M_i)$ where $C_i$ is the actual class and $M_i$ is the
    class that the underlying model gives. We use these probabilities to attempt a more balanced prediction by averaging
    the correction from the confusion matrix with the original probabilities.

    $$P(\text{class}_j) = \alpha P(\text{model}_j) + (1-\alpha) P(\text{class}_j | \text{model}_j) P(\text{model}_j)$$

    Parameters
    ----------
    estimator : scikit-learn compatible classifier
        The estimator to be wrapped, it must implement a `predict_proba` method.
    alpha : float, default=0.5
        Hyperparameter which determines how much smoothing to apply. Must be between 0 and 1.
    cfm_smooth : float, default=0
        Smoothing parameter for the confusion matrices to ensure zeros don't exist.

    Attributes
    ----------
    classes_ : array-like of shape (n_classes,)
        The target class labels.
    cfm_ : array-like of shape (n_classes, n_classes)
        The confusion matrix used for the correction.
    """

    _required_parameters = ["estimator"]

    def __init__(self, estimator, alpha: float = 0.5, cfm_smooth=0):
        self.estimator = estimator
        self.alpha = alpha
        self.cfm_smooth = cfm_smooth

    def fit(self, X, y):
        """Fit the underlying estimator on the training data `X` and `y`, it calculates the confusion matrix,
        normalizes it and stores it for later use.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data.
        y : array-like of shape (n_samples,)
            Target values.

        Returns
        -------
        self : ConfusionBalancer
            The fitted estimator.

        Raises
        ------
        ValueError
            If the underlying estimator does not have a `predict_proba` method.
        """

        X, y = check_X_y(X, y, estimator=self.estimator, dtype=FLOAT_DTYPES)
        if not isinstance(self.estimator, ProbabilisticClassifier):
            raise ValueError(
                "The ConfusionBalancer meta model only works on classification models with .predict_proba."
            )
        self.estimator_ = clone(self.estimator).fit(X, y)
        self.classes_ = unique_labels(y)
        cfm = confusion_matrix(y, self.estimator_.predict(X)).T + self.cfm_smooth
        self.cfm_ = cfm / cfm.sum(axis=1).reshape(-1, 1)
        self.n_features_in_ = X.shape[1]
        return self

    def predict_proba(self, X):
        """Predict probabilities for new data `X` using the underlying estimator and then applying the confusion matrix
        correction.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            The data to predict.

        Returns
        -------
        array-like of shape (n_samples, n_classes)
            The predicted values.
        """
        check_is_fitted(self, ["cfm_", "classes_", "estimator_"])
        X = check_array(X, dtype=FLOAT_DTYPES)
        preds = self.estimator_.predict_proba(X)
        return (1 - self.alpha) * preds + self.alpha * preds @ self.cfm_

    def predict(self, X):
        """Predict most likely class for new data `X` using the underlying estimator.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            The data to predict.

        Returns
        -------
        array-like of shape (n_samples,)
            The predicted values.
        """
        check_is_fitted(self, ["cfm_", "classes_", "estimator_"])
        X = check_array(X, dtype=FLOAT_DTYPES)
        return self.classes_[self.predict_proba(X).argmax(axis=1)]

fit(X, y)

Fit the underlying estimator on the training data X and y, it calculates the confusion matrix, normalizes it and stores it for later use.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Training data.

required
y array-like of shape (n_samples,)

Target values.

required

Returns:

Name Type Description
self ConfusionBalancer

The fitted estimator.

Raises:

Type Description
ValueError

If the underlying estimator does not have a predict_proba method.

Source code in sklego/meta/confusion_balancer.py
def fit(self, X, y):
    """Fit the underlying estimator on the training data `X` and `y`, it calculates the confusion matrix,
    normalizes it and stores it for later use.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Training data.
    y : array-like of shape (n_samples,)
        Target values.

    Returns
    -------
    self : ConfusionBalancer
        The fitted estimator.

    Raises
    ------
    ValueError
        If the underlying estimator does not have a `predict_proba` method.
    """

    X, y = check_X_y(X, y, estimator=self.estimator, dtype=FLOAT_DTYPES)
    if not isinstance(self.estimator, ProbabilisticClassifier):
        raise ValueError(
            "The ConfusionBalancer meta model only works on classification models with .predict_proba."
        )
    self.estimator_ = clone(self.estimator).fit(X, y)
    self.classes_ = unique_labels(y)
    cfm = confusion_matrix(y, self.estimator_.predict(X)).T + self.cfm_smooth
    self.cfm_ = cfm / cfm.sum(axis=1).reshape(-1, 1)
    self.n_features_in_ = X.shape[1]
    return self

predict(X)

Predict most likely class for new data X using the underlying estimator.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

The data to predict.

required

Returns:

Type Description
array-like of shape (n_samples,)

The predicted values.

Source code in sklego/meta/confusion_balancer.py
def predict(self, X):
    """Predict most likely class for new data `X` using the underlying estimator.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        The data to predict.

    Returns
    -------
    array-like of shape (n_samples,)
        The predicted values.
    """
    check_is_fitted(self, ["cfm_", "classes_", "estimator_"])
    X = check_array(X, dtype=FLOAT_DTYPES)
    return self.classes_[self.predict_proba(X).argmax(axis=1)]

predict_proba(X)

Predict probabilities for new data X using the underlying estimator and then applying the confusion matrix correction.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

The data to predict.

required

Returns:

Type Description
array-like of shape (n_samples, n_classes)

The predicted values.

Source code in sklego/meta/confusion_balancer.py
def predict_proba(self, X):
    """Predict probabilities for new data `X` using the underlying estimator and then applying the confusion matrix
    correction.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        The data to predict.

    Returns
    -------
    array-like of shape (n_samples, n_classes)
        The predicted values.
    """
    check_is_fitted(self, ["cfm_", "classes_", "estimator_"])
    X = check_array(X, dtype=FLOAT_DTYPES)
    preds = self.estimator_.predict_proba(X)
    return (1 - self.alpha) * preds + self.alpha * preds @ self.cfm_

sklego.meta.decay_estimator.DecayEstimator

Bases: BaseEstimator, MetaEstimatorMixin

Morphs an estimator such that the training weights can be adapted to ensure that points that are far away have less weight.

This meta estimator will only work for estimators that allow a sample_weights argument in their .fit() method. The meta estimator .fit() method computes the weights to pass to the estimator's .fit() method.

Warning

It is up to the user to sort the dataset appropriately.

Warning

By default all the checks on the inputs X and y are delegated to the wrapped estimator. To change such behaviour, set check_input to True. Remark that if the check is skipped, then y should have a shape attribute, which is used to extract the number of samples in training data, and compute the weights.

Parameters:

Name Type Description Default
model scikit-learn compatible estimator

The estimator to be wrapped.

required
decay_func Literal[linear, exponential, stepwise, sigmoid] | Callable[[ndarray, ndarray, ...], ndarray]

The decay function to use. Available built-in decay functions are:

  • "linear": linear decay from max_value to min_value.
  • "exponential": exponential decay with decay rate decay_rate.
  • "stepwise": stepwise decay from max_value to min_value, with n_steps steps or step size step_size.
  • "sigmoid": sigmoid decay from max_value to min_value with decay rate growth_rate.

Otherwise a callable can be passed and it should accept X, y as first two positional arguments and any other keyword argument passed along from decay_kwargs (if any). It should compute the weights and return an array of shape (n_samples,).

"exponential"
check_input bool

Whether or not to check the input data. If False, the checks are delegated to the wrapped estimator.

False
decay_kwargs dict | None

Keyword arguments to the decay function.

None

Attributes:

Name Type Description
estimator_ scikit-learn compatible estimator

The fitted estimator.

weights_ array-like of shape (n_samples,)

The weights used to train the estimator.

classes_ array-like of shape (n_classes,)

The classes labels. Only present if the wrapped estimator is a classifier.

Examples:

from sklearn.linear_model import LinearRegression
from sklego.meta import DecayEstimator

decay_estimator = DecayEstimator(
    model=LinearRegression(),
    decay_func="linear",
    min_value=0.1,
    max_value=0.9
    )

X, y = ...

# Fit the DecayEstimator on the data, this will compute the weights
# and pass them to the wrapped estimator
_ = decay_estimator.fit(X, y)

# At prediction time, the weights are not used
predictions = decay_estimator.predict(X)

# The weights are stored in the `weights_` attribute
weights = decay_estimator.weights_
Source code in sklego/meta/decay_estimator.py
class DecayEstimator(BaseEstimator, MetaEstimatorMixin):
    """Morphs an estimator such that the training weights can be adapted to ensure that points that are far away have
    less weight.

    This meta estimator will only work for estimators that allow a `sample_weights` argument in their `.fit()` method.
    The meta estimator `.fit()` method computes the weights to pass to the estimator's `.fit()` method.

    !!! warning
        It is up to the user to sort the dataset appropriately.

    !!! warning
        By default all the checks on the inputs `X` and `y` are delegated to the wrapped estimator.
        To change such behaviour, set `check_input` to `True`.
        Remark that if the check is skipped, then `y` should have a `shape` attribute, which is
        used to extract the number of samples in training data, and compute the weights.

    Parameters
    ----------
    model : scikit-learn compatible estimator
        The estimator to be wrapped.
    decay_func : Literal["linear", "exponential", "stepwise", "sigmoid"] | \
            Callable[[np.ndarray, np.ndarray, ...], np.ndarray], default="exponential"
        The decay function to use. Available built-in decay functions are:

        - `"linear"`: linear decay from `max_value` to `min_value`.
        - `"exponential"`: exponential decay with decay rate `decay_rate`.
        - `"stepwise"`: stepwise decay from `max_value` to `min_value`, with `n_steps` steps or step size `step_size`.
        - `"sigmoid"`: sigmoid decay from `max_value` to `min_value` with decay rate `growth_rate`.

        Otherwise a callable can be passed and it should accept `X`, `y` as first two positional arguments and any other
        keyword argument passed along from `decay_kwargs` (if any). It should compute the weights and return an array
        of shape `(n_samples,)`.
    check_input : bool, default=False
        Whether or not to check the input data. If False, the checks are delegated to the wrapped estimator.
    decay_kwargs : dict | None, default=None
        Keyword arguments to the decay function.

    Attributes
    ----------
    estimator_ : scikit-learn compatible estimator
        The fitted estimator.
    weights_ : array-like of shape (n_samples,)
        The weights used to train the estimator.
    classes_ : array-like of shape (n_classes,)
        The classes labels. Only present if the wrapped estimator is a classifier.

    Examples
    --------
    ```py
    from sklearn.linear_model import LinearRegression
    from sklego.meta import DecayEstimator

    decay_estimator = DecayEstimator(
        model=LinearRegression(),
        decay_func="linear",
        min_value=0.1,
        max_value=0.9
        )

    X, y = ...

    # Fit the DecayEstimator on the data, this will compute the weights
    # and pass them to the wrapped estimator
    _ = decay_estimator.fit(X, y)

    # At prediction time, the weights are not used
    predictions = decay_estimator.predict(X)

    # The weights are stored in the `weights_` attribute
    weights = decay_estimator.weights_
    ```
    """

    _ALLOWED_DECAYS = {
        "linear": linear_decay,
        "exponential": exponential_decay,
        "stepwise": stepwise_decay,
        "sigmoid": sigmoid_decay,
    }

    _required_parameters = ["model"]

    def __init__(self, model, decay_func="exponential", check_input=False, decay_kwargs=None):
        self.model = model
        self.decay_func = decay_func
        self.check_input = check_input
        self.decay_kwargs = decay_kwargs

    def _is_classifier(self):
        """Checks if the wrapped estimator is a classifier."""
        return any(["ClassifierMixin" in p.__name__ for p in type(self.model).__bases__])

    @property
    def _estimator_type(self):
        """Computes `_estimator_type` dynamically from the wrapped model."""
        return self.model._estimator_type

    def fit(self, X, y):
        """Fit the underlying estimator on the training data `X` and `y` using the calculated sample weights.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data.
        y : array-like of shape (n_samples,)
            Target values.

        Returns
        -------
        self : DecayEstimator
            The fitted estimator.
        """

        if self.check_input:
            X, y = check_X_y(X, y, estimator=self, dtype=FLOAT_DTYPES, ensure_min_features=0)

        if self.decay_func in self._ALLOWED_DECAYS.keys():
            self.decay_func_ = self._ALLOWED_DECAYS[self.decay_func]
        elif callable(self.decay_func):
            self.decay_func_ = self.decay_func
        else:
            raise ValueError(f"`decay_func` should be one of {self._ALLOWED_DECAYS.keys()} or a callable")

        self.weights_ = self.decay_func_(X, y, **(self.decay_kwargs or {}))
        self.estimator_ = clone(self.model)

        try:
            self.estimator_.fit(X, y, sample_weight=self.weights_)
        except TypeError as e:
            if "sample_weight" in str(e):
                raise TypeError(f"Model {type(self.model).__name__}.fit() does not have 'sample_weight'")

        if self._is_classifier():
            self.classes_ = self.estimator_.classes_

        self.n_features_in_ = X.shape[1]
        return self

    def predict(self, X):
        """Predict target values for `X` using trained underlying estimator.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            The data to predict.

        Returns
        -------
        array-like of shape (n_samples,)
            The predicted values.
        """
        if self._is_classifier():
            check_is_fitted(self, ["classes_"])

        check_is_fitted(self, ["weights_", "estimator_"])
        return self.estimator_.predict(X)

    def score(self, X, y):
        """Alias for `.score()` method of the underlying estimator."""
        return self.estimator_.score(X, y)

fit(X, y)

Fit the underlying estimator on the training data X and y using the calculated sample weights.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Training data.

required
y array-like of shape (n_samples,)

Target values.

required

Returns:

Name Type Description
self DecayEstimator

The fitted estimator.

Source code in sklego/meta/decay_estimator.py
def fit(self, X, y):
    """Fit the underlying estimator on the training data `X` and `y` using the calculated sample weights.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Training data.
    y : array-like of shape (n_samples,)
        Target values.

    Returns
    -------
    self : DecayEstimator
        The fitted estimator.
    """

    if self.check_input:
        X, y = check_X_y(X, y, estimator=self, dtype=FLOAT_DTYPES, ensure_min_features=0)

    if self.decay_func in self._ALLOWED_DECAYS.keys():
        self.decay_func_ = self._ALLOWED_DECAYS[self.decay_func]
    elif callable(self.decay_func):
        self.decay_func_ = self.decay_func
    else:
        raise ValueError(f"`decay_func` should be one of {self._ALLOWED_DECAYS.keys()} or a callable")

    self.weights_ = self.decay_func_(X, y, **(self.decay_kwargs or {}))
    self.estimator_ = clone(self.model)

    try:
        self.estimator_.fit(X, y, sample_weight=self.weights_)
    except TypeError as e:
        if "sample_weight" in str(e):
            raise TypeError(f"Model {type(self.model).__name__}.fit() does not have 'sample_weight'")

    if self._is_classifier():
        self.classes_ = self.estimator_.classes_

    self.n_features_in_ = X.shape[1]
    return self

predict(X)

Predict target values for X using trained underlying estimator.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

The data to predict.

required

Returns:

Type Description
array-like of shape (n_samples,)

The predicted values.

Source code in sklego/meta/decay_estimator.py
def predict(self, X):
    """Predict target values for `X` using trained underlying estimator.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        The data to predict.

    Returns
    -------
    array-like of shape (n_samples,)
        The predicted values.
    """
    if self._is_classifier():
        check_is_fitted(self, ["classes_"])

    check_is_fitted(self, ["weights_", "estimator_"])
    return self.estimator_.predict(X)

score(X, y)

Alias for .score() method of the underlying estimator.

Source code in sklego/meta/decay_estimator.py
def score(self, X, y):
    """Alias for `.score()` method of the underlying estimator."""
    return self.estimator_.score(X, y)

sklego.meta.estimator_transformer.EstimatorTransformer

Bases: TransformerMixin, MetaEstimatorMixin, BaseEstimator

Allow using an estimator as a transformer in an earlier step of a pipeline.

Warning

By default all the checks on the inputs X and y are delegated to the wrapped estimator.

To change such behaviour, set check_input to True.

Parameters:

Name Type Description Default
estimator scikit-learn compatible estimator

The estimator to be applied to the data, used as transformer.

required
predict_func str

The method called on the estimator when transforming e.g. ("predict", "predict_proba").

"predict"
check_input bool

Whether or not to check the input data. If False, the checks are delegated to the wrapped estimator.

False

Attributes:

Name Type Description
estimator_ scikit-learn compatible estimator

The fitted underlying estimator.

multi_output_ bool

Whether or not the estimator is multi output.

Source code in sklego/meta/estimator_transformer.py
class EstimatorTransformer(TransformerMixin, MetaEstimatorMixin, BaseEstimator):
    """Allow using an estimator as a transformer in an earlier step of a pipeline.

    !!! warning
        By default all the checks on the inputs `X` and `y` are delegated to the wrapped estimator.

        To change such behaviour, set `check_input` to `True`.

    Parameters
    ----------
    estimator : scikit-learn compatible estimator
        The estimator to be applied to the data, used as transformer.
    predict_func : str, default="predict"
        The method called on the estimator when transforming e.g. (`"predict"`, `"predict_proba"`).
    check_input : bool, default=False
        Whether or not to check the input data. If False, the checks are delegated to the wrapped estimator.

    Attributes
    ----------
    estimator_ : scikit-learn compatible estimator
        The fitted underlying estimator.
    multi_output_ : bool
        Whether or not the estimator is multi output.
    """

    def __init__(self, estimator, predict_func="predict", check_input=False):
        self.estimator = estimator
        self.predict_func = predict_func
        self.check_input = check_input

    def fit(self, X, y, **kwargs):
        """Fit the underlying estimator on training data `X` and `y`.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data.
        y : array-like of shape (n_samples,)
            Target values.
        **kwargs : dict
            Additional keyword arguments passed to the `fit` method of the underlying estimator.

        Returns
        -------
        self : EstimatorTransformer
            The fitted transformer.
        """

        if self.check_input:
            X, y = check_X_y(X, y, estimator=self, dtype=FLOAT_DTYPES, multi_output=True)

        self.multi_output_ = len(y.shape) > 1
        self.estimator_ = clone(self.estimator)
        self.estimator_.fit(X, y, **kwargs)
        self.n_features_in_ = X.shape[1]
        return self

    def transform(self, X):
        """Transform the data by applying the `predict_func` of the fitted estimator.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            The data to be transformed.

        Returns
        -------
        output : array-like of shape (n_samples,) | (n_samples, n_outputs)
            The transformed data. Array will be of shape `(X.shape[0], )` if estimator is not multi output.
            For multi output estimators an array of shape `(X.shape[0], y.shape[1])` is returned.
        """

        check_is_fitted(self, "estimator_")
        output = getattr(self.estimator_, self.predict_func)(X)
        return output if self.multi_output_ else output.reshape(-1, 1)

fit(X, y, **kwargs)

Fit the underlying estimator on training data X and y.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Training data.

required
y array-like of shape (n_samples,)

Target values.

required
**kwargs dict

Additional keyword arguments passed to the fit method of the underlying estimator.

{}

Returns:

Name Type Description
self EstimatorTransformer

The fitted transformer.

Source code in sklego/meta/estimator_transformer.py
def fit(self, X, y, **kwargs):
    """Fit the underlying estimator on training data `X` and `y`.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Training data.
    y : array-like of shape (n_samples,)
        Target values.
    **kwargs : dict
        Additional keyword arguments passed to the `fit` method of the underlying estimator.

    Returns
    -------
    self : EstimatorTransformer
        The fitted transformer.
    """

    if self.check_input:
        X, y = check_X_y(X, y, estimator=self, dtype=FLOAT_DTYPES, multi_output=True)

    self.multi_output_ = len(y.shape) > 1
    self.estimator_ = clone(self.estimator)
    self.estimator_.fit(X, y, **kwargs)
    self.n_features_in_ = X.shape[1]
    return self

transform(X)

Transform the data by applying the predict_func of the fitted estimator.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

The data to be transformed.

required

Returns:

Name Type Description
output array-like of shape (n_samples,) | (n_samples, n_outputs)

The transformed data. Array will be of shape (X.shape[0], ) if estimator is not multi output. For multi output estimators an array of shape (X.shape[0], y.shape[1]) is returned.

Source code in sklego/meta/estimator_transformer.py
def transform(self, X):
    """Transform the data by applying the `predict_func` of the fitted estimator.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        The data to be transformed.

    Returns
    -------
    output : array-like of shape (n_samples,) | (n_samples, n_outputs)
        The transformed data. Array will be of shape `(X.shape[0], )` if estimator is not multi output.
        For multi output estimators an array of shape `(X.shape[0], y.shape[1])` is returned.
    """

    check_is_fitted(self, "estimator_")
    output = getattr(self.estimator_, self.predict_func)(X)
    return output if self.multi_output_ else output.reshape(-1, 1)

sklego.meta.grouped_predictor.GroupedPredictor

Bases: ShrinkageMixin, MetaEstimatorMixin, BaseEstimator

GroupedPredictor is a meta-estimator that fits a separate estimator for each group in the input data.

The input data is split into a group and a value part: for each unique combination of the group columns, a separate estimator is fitted to the corresponding value rows. The group columns are specified by the groups parameter.

If use_global_model=True a fallback estimator will be fitted on the entire dataset in case a group is not found during .predict().

If shrinkage is not None, the predictions of the group-level models are combined using a shrinkage method. The shrinkage method can be one of the predefined methods "constant", "equal", "min_n_obs", "relative" or a custom shrinkage function. The shrinkage method is specified by the shrinkage parameter.

Shrinkage

Shrinkage is only available for regression models.

Parameters:

Name Type Description Default
estimator scikit-learn compatible estimator/pipeline

The estimator/pipeline to be applied per group.

required
groups int | str | List[int] | List[str]

The column(s) of the array/dataframe to select as a grouping parameter set.

required
shrinkage Literal[constant, equal, min_n_obs, relative] | Callable | None

How to perform shrinkage:

  • None: No shrinkage (default)
  • "constant": the augmented prediction for each level is the weighted average between its prediction and the augmented prediction for its parent.
  • "equal": each group is weighed equally.
  • "min_n_obs": use only the smallest group with a certain amount of observations.
  • "relative": weigh each group according to its size.
  • Callable: a function that takes a list of group lengths and returns an array of the same size with the weights for each group.
None
use_global_model bool
  • With shrinkage: whether to have a model over the entire input as first group
  • Without shrinkage: whether or not to fall back to a general model in case the group parameter is not found during .predict()
True
check_X bool

Whether to validate X to be non-empty 2D array of finite values and attempt to cast X to float. If disabled, the model/pipeline is expected to handle e.g. missing, non-numeric, or non-finite values.

True
**shrinkage_kwargs dict

Keyword arguments to the shrinkage function

None

Attributes:

Name Type Description
estimators_ dict

A dictionary with the fitted estimators per group

groups_ list

A list of all the groups that were found during fitting

fallback_ estimator

A fallback estimator that is used when use_global_model=True and a group is not found during .predict()

shrinkage_function_ callable

The shrinkage function that is used to calculate the shrinkage factors

shrinkage_factors_ dict

A dictionary with the shrinkage factors per group

Source code in sklego/meta/grouped_predictor.py
class GroupedPredictor(ShrinkageMixin, MetaEstimatorMixin, BaseEstimator):
    """`GroupedPredictor` is a meta-estimator that fits a separate estimator for each group in the input data.

    The input data is split into a group and a value part: for each unique combination of the group columns, a separate
    estimator is fitted to the corresponding value rows. The group columns are specified by the `groups` parameter.

    If `use_global_model=True` a fallback estimator will be fitted on the entire dataset in case a group is not found
    during `.predict()`.

    If `shrinkage` is not `None`, the predictions of the group-level models are combined using a shrinkage method. The
    shrinkage method can be one of the predefined methods `"constant"`, `"equal"`, `"min_n_obs"`, `"relative"` or a
    custom shrinkage function. The shrinkage method is specified by the `shrinkage` parameter.

    !!! warning "Shrinkage"
        Shrinkage is only available for regression models.

    Parameters
    ----------
    estimator : scikit-learn compatible estimator/pipeline
        The estimator/pipeline to be applied per group.
    groups : int | str | List[int] | List[str]
        The column(s) of the array/dataframe to select as a grouping parameter set.
    shrinkage : Literal["constant", "equal", "min_n_obs", "relative"] | Callable | None, default=None
        How to perform shrinkage:

        - `None`: No shrinkage (default)
        - `"constant"`: the augmented prediction for each level is the weighted average between its prediction and the
            augmented prediction for its parent.
        - `"equal"`: each group is weighed equally.
        - `"min_n_obs"`: use only the smallest group with a certain amount of observations.
        - `"relative"`: weigh each group according to its size.
        - `Callable`: a function that takes a list of group lengths and returns an array of the same size with the
            weights for each group.
    use_global_model : bool, default=True

        - With shrinkage: whether to have a model over the entire input as first group
        - Without shrinkage: whether or not to fall back to a general model in case the group parameter is not found
            during `.predict()`
    check_X : bool, default=True
        Whether to validate `X` to be non-empty 2D array of finite values and attempt to cast `X` to float.
        If disabled, the model/pipeline is expected to handle e.g. missing, non-numeric, or non-finite values.
    **shrinkage_kwargs : dict
        Keyword arguments to the shrinkage function

    Attributes
    ----------
    estimators_ : dict
        A dictionary with the fitted estimators per group
    groups_ : list
        A list of all the groups that were found during fitting
    fallback_ : estimator
        A fallback estimator that is used when `use_global_model=True` and a group is not found during `.predict()`
    shrinkage_function_ : callable
        The shrinkage function that is used to calculate the shrinkage factors
    shrinkage_factors_ : dict
        A dictionary with the shrinkage factors per group
    """

    # Number of features in value df can be 0, e.g. for dummy models
    _check_kwargs = {"ensure_min_features": 0, "accept_large_sparse": False}
    _global_col_name = "a-column-that-is-constant-for-all-data"
    _global_col_value = "global"

    _ALLOWED_SHRINKAGE = {
        "constant": constant_shrinkage,
        "relative": relative_shrinkage,
        "min_n_obs": min_n_obs_shrinkage,
        "equal": equal_shrinkage,
    }

    _required_parameters = ["estimator", "groups"]

    def __init__(
        self,
        estimator,
        groups,
        shrinkage=None,
        use_global_model=True,
        check_X=True,
        shrinkage_kwargs=None,
    ):
        self.estimator = estimator
        self.groups = groups
        self.shrinkage = shrinkage
        self.use_global_model = use_global_model
        self.shrinkage_kwargs = shrinkage_kwargs
        self.check_X = check_X

    def __fit_single_group(self, group, X, y=None):
        """Fit estimator to the given group."""
        try:
            return clone(self.estimator).fit(X, y)
        except Exception as e:
            raise type(e)(f"Exception for group {group}: {e}")

    def __fit_grouped_estimator(
        self, frame: nw.DataFrame, y: Union[np.ndarray, None] = None, columns: Union[List[int], List[str], None] = None
    ):
        """Fit an estimator to each group"""

        if columns is None:
            columns = self._groups

        to_drop = list(set(["__sklego_target__", *columns, *as_list(self.groups)]))
        grouped_estimators = {
            # Fit a clone of the estimators to each group
            (group_name[0] if len(group_name) == 1 else group_name): self.__fit_single_group(
                group=(group_name[0] if len(group_name) == 1 else group_name),
                X=nw.to_native(X_grp.drop(to_drop)),
                y=(X_grp.select("__sklego_target__").to_numpy().reshape(-1) if y is not None else None),
            )
            for group_name, X_grp in frame.group_by(columns)
        }

        return grouped_estimators

    def __fit_shrinkage_groups(self, frame, y):
        estimators = {}

        for grouping_colnames in self.group_colnames_hierarchical_:
            # Fit a grouped estimator to each (sub)group hierarchically
            estimators.update(self.__fit_grouped_estimator(frame, y, columns=grouping_colnames))

        return estimators

    def __add_shrinkage_column(self, frame, groups=None):
        """Add global group as first column if needed for shrinkage"""

        if self.shrinkage is not None and self.use_global_model:
            n_samples = frame.shape[0]

            frame = frame.select(
                nw.from_dict(
                    data={self._global_col_name: np.full(shape=n_samples, fill_value=self._global_col_value)},
                    native_namespace=nw.get_native_namespace(frame),
                )[self._global_col_name],
                nw.all(),
            )
            groups = [self._global_col_name] if groups is None else [self._global_col_name, *groups]

        return frame, groups

    def fit(self, X, y=None):
        """Fit one estimator for each group of training data `X` and `y`.

        Will also learn the groups that exist within the dataset.

        If `use_global_model=True` a fallback estimator will be fitted on the entire dataset in case a group is not
        found during `.predict()`.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data.
        y : array-like of shape (n_samples,), default=None
            Target values.

        Returns
        -------
        self : GroupedPredictor
            The fitted estimator.
        """
        if self.shrinkage is not None and not is_regressor(self.estimator):
            raise ValueError("Shrinkage is only available for regression models")

        _group_cols = as_list(deepcopy(self.groups)) if self.groups is not None else None

        if (
            self.shrinkage is not None
            and _group_cols is not None
            and len(_group_cols) == 1
            and not self.use_global_model
        ):
            raise ValueError("Shrinkage is not null, but found a total of 1 groups")

        X = nw.from_native(X, strict=False, eager_only=True)

        frame = parse_X_y(X, y, _group_cols, check_X=self.check_X, **self._check_kwargs)
        frame, _group_cols = self.__add_shrinkage_column(frame, _group_cols)
        self.n_features_in_ = frame.shape[1] - 1
        self.n_fitted_levels_ = 1 + self.use_global_model

        self.shrinkage_function_ = self._set_shrinkage_function()

        # List of all hierarchical subsets of columns
        self.group_colnames_hierarchical_ = expanding_list(_group_cols, list)
        self.fallback_ = None

        if self.shrinkage is None and self.use_global_model:
            X_ = nw.to_native(frame.drop([*_group_cols, "__sklego_target__"]))
            y_ = nw.to_native(frame["__sklego_target__"])

            self.fallback_ = clone(self.estimator).fit(X_, y_)

        if self.shrinkage is not None:
            self.estimators_ = self.__fit_shrinkage_groups(frame, y)
        else:
            self.estimators_ = self.__fit_grouped_estimator(frame, y, columns=_group_cols)

        self.groups_ = as_list(self.estimators_.keys())

        if self.shrinkage is not None:
            _groups = (
                [self._global_col_name, *as_list(deepcopy(self.groups))]
                if self.use_global_model
                else as_list(deepcopy(self.groups))
            )

            self.shrinkage_factors_ = self._fit_shrinkage_factors(frame, groups=_groups, most_granular_only=True)
            self.shrinkage_factors_ = {(k[0] if len(k) == 1 else k): v for k, v in self.shrinkage_factors_.items()}

        return self

    def __predict_shrinkage_groups(self, frame, method="predict", groups=None):
        """Make predictions for all shrinkage groups"""
        # DataFrame with predictions for each hierarchy level, per row. Missing groups errors are thrown here.
        hierarchical_predictions = pd.concat(
            [
                pd.Series(self.__predict_groups(frame, method=method, groups=level_columns))
                for level_columns in self.group_colnames_hierarchical_
            ],
            axis=1,
        )

        # This is a Series with values the tuples of hierarchical grouping
        prediction_groups = pd.Series([tuple(_) for _ in frame.select(groups).to_pandas().itertuples(index=False)])

        # This is a Series of arrays
        shrinkage_factors = prediction_groups.map(self.shrinkage_factors_)

        # Convert the Series of arrays it to a DataFrame
        shrinkage_factors = pd.DataFrame.from_dict(shrinkage_factors.to_dict()).T

        return (hierarchical_predictions * shrinkage_factors).sum(axis=1)

    def __predict_single_group(self, group, X, method="predict"):
        """Predict a single group by getting its estimator from the fitted dict"""

        try:
            group_predictor = self.estimators_[group]
        except KeyError:
            if self.fallback_:
                group_predictor = self.fallback_
            else:
                raise ValueError(f"Found new group {group} during predict with use_global_model = False")

        is_predict_proba = is_classifier(group_predictor) and method == "predict_proba"
        # Ensure to provide pd.DataFrame with the correct label name
        extra_kwargs = {"columns": group_predictor.classes_} if is_predict_proba else {}

        # getattr(group_predictor, method) returns the predict method of the fitted model
        # if the method argument is "predict" and the predict_proba method if method argument is "predict_proba"
        return pd.DataFrame(getattr(group_predictor, method)(X), **extra_kwargs)

    def __predict_groups(self, frame: nw.DataFrame, method="predict", groups=None):
        """Predict for all groups"""

        n_samples = frame.shape[0]
        frame = frame.with_columns(__sklego_index__=np.arange(n_samples))
        return (
            pd.concat(
                [
                    self.__predict_single_group(
                        (group_value[0] if len(group_value) == 1 else group_value),
                        nw.to_native(X_grp.drop(["__sklego_index__", *groups, *as_list(self.groups)])),
                        method=method,
                    ).set_index(X_grp["__sklego_index__"].to_numpy().reshape(-1).astype(int))
                    for group_value, X_grp in frame.group_by(groups)
                ],
                axis=0,
            )
            .fillna(0)
            .sort_index()
            .to_numpy()
            .squeeze()
        )

    def predict(self, X):
        """Predict target values on new data `X` by predicting on each group. If a group is not found during
        `.predict()` and `use_global_model=True` the fallback estimator will be used. If `use_global_model=False` a
        `ValueError` will be raised.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Data to predict.

        Returns
        -------
        array-like of shape (n_samples,)
            Predicted target values.
        """
        check_is_fitted(self, ["estimators_", "groups_", "fallback_"])

        _group_cols = as_list(deepcopy(self.groups)) if self.groups is not None else None
        X = nw.from_native(X, strict=False, eager_only=True)
        frame = parse_X_y(X, y=None, groups=_group_cols, check_X=self.check_X, **self._check_kwargs).drop(
            "__sklego_target__"
        )
        frame, _group_cols = self.__add_shrinkage_column(frame, _group_cols)

        if self.shrinkage is None:
            return self.__predict_groups(frame, method="predict", groups=_group_cols)
        else:
            return self.__predict_shrinkage_groups(frame, method="predict", groups=_group_cols)

    # This ensures that the meta-estimator only has the predict_proba method if the estimator has it
    @available_if(lambda self: hasattr(self.estimator, "predict_proba"))
    def predict_proba(self, X):
        """Predict probabilities on new data `X`.

        !!! warning
            Available only if the underlying estimator implements `.predict_proba()` method.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Data to predict.

        Returns
        -------
        array-like of shape (n_samples, n_classes)
            Predicted probabilities per class.
        """
        check_is_fitted(self, ["estimators_", "groups_", "fallback_"])

        _group_cols = as_list(deepcopy(self.groups)) if self.groups is not None else None
        X = nw.from_native(X, strict=False, eager_only=True)
        frame = parse_X_y(X, y=None, groups=_group_cols, check_X=self.check_X, **self._check_kwargs).drop(
            "__sklego_target__"
        )
        frame, _group_cols = self.__add_shrinkage_column(frame, _group_cols)

        if self.shrinkage is None:
            return self.__predict_groups(frame, method="predict_proba", groups=_group_cols)
        else:
            return self.__predict_shrinkage_groups(frame, method="predict_proba", groups=_group_cols)

    # This ensures that the meta-estimator only has the predict_proba method if the estimator has it
    @available_if(lambda self: hasattr(self.estimator, "decision_function"))
    def decision_function(self, X):
        """Predict confidence scores for samples in `X`.

        !!! warning
            Available only if the underlying estimator implements `.decision_function()` method.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Data to predict.

        Returns
        -------
        array-like of shape (n_samples,) or (n_samples, n_classes)
            Confidence scores per (n_samples, n_classes) combination.
            In the binary case, confidence score for self.classes_[1] where > 0 means this class would be
            predicted.
        """
        check_is_fitted(self, ["estimators_", "groups_", "fallback_"])

        _group_cols = as_list(deepcopy(self.groups)) if self.groups is not None else None
        X = nw.from_native(X, strict=False, eager_only=True)

        frame = parse_X_y(X, y=None, groups=_group_cols, check_X=self.check_X, **self._check_kwargs).drop(
            "__sklego_target__"
        )
        frame, _group_cols = self.__add_shrinkage_column(frame, _group_cols)

        if self.shrinkage is None:
            return self.__predict_groups(frame, method="decision_function", groups=_group_cols)
        else:
            return self.__predict_shrinkage_groups(frame, method="decision_function", groups=_group_cols)

    @property
    def _estimator_type(self):
        """Computes `_estimator_type` dynamically from the wrapped model."""
        return self.estimator._estimator_type

    def _more_tags(self):
        return {"allow_nan": True}

decision_function(X)

Predict confidence scores for samples in X.

Warning

Available only if the underlying estimator implements .decision_function() method.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Data to predict.

required

Returns:

Type Description
array-like of shape (n_samples,) or (n_samples, n_classes)

Confidence scores per (n_samples, n_classes) combination. In the binary case, confidence score for self.classes_[1] where > 0 means this class would be predicted.

Source code in sklego/meta/grouped_predictor.py
@available_if(lambda self: hasattr(self.estimator, "decision_function"))
def decision_function(self, X):
    """Predict confidence scores for samples in `X`.

    !!! warning
        Available only if the underlying estimator implements `.decision_function()` method.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Data to predict.

    Returns
    -------
    array-like of shape (n_samples,) or (n_samples, n_classes)
        Confidence scores per (n_samples, n_classes) combination.
        In the binary case, confidence score for self.classes_[1] where > 0 means this class would be
        predicted.
    """
    check_is_fitted(self, ["estimators_", "groups_", "fallback_"])

    _group_cols = as_list(deepcopy(self.groups)) if self.groups is not None else None
    X = nw.from_native(X, strict=False, eager_only=True)

    frame = parse_X_y(X, y=None, groups=_group_cols, check_X=self.check_X, **self._check_kwargs).drop(
        "__sklego_target__"
    )
    frame, _group_cols = self.__add_shrinkage_column(frame, _group_cols)

    if self.shrinkage is None:
        return self.__predict_groups(frame, method="decision_function", groups=_group_cols)
    else:
        return self.__predict_shrinkage_groups(frame, method="decision_function", groups=_group_cols)

fit(X, y=None)

Fit one estimator for each group of training data X and y.

Will also learn the groups that exist within the dataset.

If use_global_model=True a fallback estimator will be fitted on the entire dataset in case a group is not found during .predict().

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Training data.

required
y array-like of shape (n_samples,)

Target values.

None

Returns:

Name Type Description
self GroupedPredictor

The fitted estimator.

Source code in sklego/meta/grouped_predictor.py
def fit(self, X, y=None):
    """Fit one estimator for each group of training data `X` and `y`.

    Will also learn the groups that exist within the dataset.

    If `use_global_model=True` a fallback estimator will be fitted on the entire dataset in case a group is not
    found during `.predict()`.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Training data.
    y : array-like of shape (n_samples,), default=None
        Target values.

    Returns
    -------
    self : GroupedPredictor
        The fitted estimator.
    """
    if self.shrinkage is not None and not is_regressor(self.estimator):
        raise ValueError("Shrinkage is only available for regression models")

    _group_cols = as_list(deepcopy(self.groups)) if self.groups is not None else None

    if (
        self.shrinkage is not None
        and _group_cols is not None
        and len(_group_cols) == 1
        and not self.use_global_model
    ):
        raise ValueError("Shrinkage is not null, but found a total of 1 groups")

    X = nw.from_native(X, strict=False, eager_only=True)

    frame = parse_X_y(X, y, _group_cols, check_X=self.check_X, **self._check_kwargs)
    frame, _group_cols = self.__add_shrinkage_column(frame, _group_cols)
    self.n_features_in_ = frame.shape[1] - 1
    self.n_fitted_levels_ = 1 + self.use_global_model

    self.shrinkage_function_ = self._set_shrinkage_function()

    # List of all hierarchical subsets of columns
    self.group_colnames_hierarchical_ = expanding_list(_group_cols, list)
    self.fallback_ = None

    if self.shrinkage is None and self.use_global_model:
        X_ = nw.to_native(frame.drop([*_group_cols, "__sklego_target__"]))
        y_ = nw.to_native(frame["__sklego_target__"])

        self.fallback_ = clone(self.estimator).fit(X_, y_)

    if self.shrinkage is not None:
        self.estimators_ = self.__fit_shrinkage_groups(frame, y)
    else:
        self.estimators_ = self.__fit_grouped_estimator(frame, y, columns=_group_cols)

    self.groups_ = as_list(self.estimators_.keys())

    if self.shrinkage is not None:
        _groups = (
            [self._global_col_name, *as_list(deepcopy(self.groups))]
            if self.use_global_model
            else as_list(deepcopy(self.groups))
        )

        self.shrinkage_factors_ = self._fit_shrinkage_factors(frame, groups=_groups, most_granular_only=True)
        self.shrinkage_factors_ = {(k[0] if len(k) == 1 else k): v for k, v in self.shrinkage_factors_.items()}

    return self

predict(X)

Predict target values on new data X by predicting on each group. If a group is not found during .predict() and use_global_model=True the fallback estimator will be used. If use_global_model=False a ValueError will be raised.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Data to predict.

required

Returns:

Type Description
array-like of shape (n_samples,)

Predicted target values.

Source code in sklego/meta/grouped_predictor.py
def predict(self, X):
    """Predict target values on new data `X` by predicting on each group. If a group is not found during
    `.predict()` and `use_global_model=True` the fallback estimator will be used. If `use_global_model=False` a
    `ValueError` will be raised.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Data to predict.

    Returns
    -------
    array-like of shape (n_samples,)
        Predicted target values.
    """
    check_is_fitted(self, ["estimators_", "groups_", "fallback_"])

    _group_cols = as_list(deepcopy(self.groups)) if self.groups is not None else None
    X = nw.from_native(X, strict=False, eager_only=True)
    frame = parse_X_y(X, y=None, groups=_group_cols, check_X=self.check_X, **self._check_kwargs).drop(
        "__sklego_target__"
    )
    frame, _group_cols = self.__add_shrinkage_column(frame, _group_cols)

    if self.shrinkage is None:
        return self.__predict_groups(frame, method="predict", groups=_group_cols)
    else:
        return self.__predict_shrinkage_groups(frame, method="predict", groups=_group_cols)

predict_proba(X)

Predict probabilities on new data X.

Warning

Available only if the underlying estimator implements .predict_proba() method.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Data to predict.

required

Returns:

Type Description
array-like of shape (n_samples, n_classes)

Predicted probabilities per class.

Source code in sklego/meta/grouped_predictor.py
@available_if(lambda self: hasattr(self.estimator, "predict_proba"))
def predict_proba(self, X):
    """Predict probabilities on new data `X`.

    !!! warning
        Available only if the underlying estimator implements `.predict_proba()` method.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Data to predict.

    Returns
    -------
    array-like of shape (n_samples, n_classes)
        Predicted probabilities per class.
    """
    check_is_fitted(self, ["estimators_", "groups_", "fallback_"])

    _group_cols = as_list(deepcopy(self.groups)) if self.groups is not None else None
    X = nw.from_native(X, strict=False, eager_only=True)
    frame = parse_X_y(X, y=None, groups=_group_cols, check_X=self.check_X, **self._check_kwargs).drop(
        "__sklego_target__"
    )
    frame, _group_cols = self.__add_shrinkage_column(frame, _group_cols)

    if self.shrinkage is None:
        return self.__predict_groups(frame, method="predict_proba", groups=_group_cols)
    else:
        return self.__predict_shrinkage_groups(frame, method="predict_proba", groups=_group_cols)

sklego.meta.grouped_predictor.GroupedClassifier

Bases: GroupedPredictor, ClassifierMixin

GroupedClassifier is a meta-estimator that fits a separate classifier for each group in the input data.

Its equivalent to GroupedPredictor with shrinkage=None but it is available only for classification models.

New in version 0.8.0

Source code in sklego/meta/grouped_predictor.py
class GroupedClassifier(GroupedPredictor, ClassifierMixin):
    """`GroupedClassifier` is a meta-estimator that fits a separate classifier for each group in the input data.

    Its equivalent to [`GroupedPredictor`][sklego.meta.grouped_predictor.GroupedPredictor] with `shrinkage=None`
    but it is available only for classification models.

    !!! info "New in version 0.8.0"
    """

    def __init__(
        self,
        estimator,
        groups,
        use_global_model=True,
        check_X=True,
        **shrinkage_kwargs,
    ):
        super().__init__(
            estimator=estimator,
            groups=groups,
            shrinkage=None,
            use_global_model=use_global_model,
            check_X=check_X,
        )

    def fit(self, X, y):
        """Fit one classifier for each group of training data `X` and `y`.

        Will also learn the groups that exist within the training dataset.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data.
        y : array-like of shape (n_samples,)
            Target values.

        Returns
        -------
        self : GroupedClassifier
            The fitted regressor.

        Raises
        -------
        ValueError
            If the supplied estimator is not a classifier.
        """

        if not is_classifier(self.estimator):
            raise ValueError("GroupedClassifier is only available for classification models")
        self.classes_ = np.unique(y)
        return super().fit(X, y)

fit(X, y)

Fit one classifier for each group of training data X and y.

Will also learn the groups that exist within the training dataset.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Training data.

required
y array-like of shape (n_samples,)

Target values.

required

Returns:

Name Type Description
self GroupedClassifier

The fitted regressor.

Raises:

Type Description
ValueError

If the supplied estimator is not a classifier.

Source code in sklego/meta/grouped_predictor.py
def fit(self, X, y):
    """Fit one classifier for each group of training data `X` and `y`.

    Will also learn the groups that exist within the training dataset.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Training data.
    y : array-like of shape (n_samples,)
        Target values.

    Returns
    -------
    self : GroupedClassifier
        The fitted regressor.

    Raises
    -------
    ValueError
        If the supplied estimator is not a classifier.
    """

    if not is_classifier(self.estimator):
        raise ValueError("GroupedClassifier is only available for classification models")
    self.classes_ = np.unique(y)
    return super().fit(X, y)

sklego.meta.grouped_predictor.GroupedRegressor

Bases: GroupedPredictor, RegressorMixin

GroupedRegressor is a meta-estimator that fits a separate regressor for each group in the input data.

Its spec is the same as GroupedPredictor but it is available only for regression models.

New in version 0.8.0

Source code in sklego/meta/grouped_predictor.py
class GroupedRegressor(GroupedPredictor, RegressorMixin):
    """`GroupedRegressor` is a meta-estimator that fits a separate regressor for each group in the input data.

    Its spec is the same as [`GroupedPredictor`][sklego.meta.grouped_predictor.GroupedPredictor] but it is available
    only for regression models.

    !!! info "New in version 0.8.0"
    """

    def fit(self, X, y):
        """Fit one regressor for each group of training data `X` and `y`.

        Will also learn the groups that exist within the training dataset.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data.
        y : array-like of shape (n_samples,)
            Target values.

        Returns
        -------
        self : GroupedRegressor
            The fitted regressor.

        Raises
        -------
        ValueError
            If the supplied estimator is not a regressor.
        """
        if not is_regressor(self.estimator):
            raise ValueError("GroupedRegressor is only available for regression models")

        return super().fit(X, y)

fit(X, y)

Fit one regressor for each group of training data X and y.

Will also learn the groups that exist within the training dataset.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Training data.

required
y array-like of shape (n_samples,)

Target values.

required

Returns:

Name Type Description
self GroupedRegressor

The fitted regressor.

Raises:

Type Description
ValueError

If the supplied estimator is not a regressor.

Source code in sklego/meta/grouped_predictor.py
def fit(self, X, y):
    """Fit one regressor for each group of training data `X` and `y`.

    Will also learn the groups that exist within the training dataset.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Training data.
    y : array-like of shape (n_samples,)
        Target values.

    Returns
    -------
    self : GroupedRegressor
        The fitted regressor.

    Raises
    -------
    ValueError
        If the supplied estimator is not a regressor.
    """
    if not is_regressor(self.estimator):
        raise ValueError("GroupedRegressor is only available for regression models")

    return super().fit(X, y)

sklego.meta.grouped_transformer.GroupedTransformer

Bases: TransformerMixin, MetaEstimatorMixin, BaseEstimator

Construct a transformer per data group. Splits data by groups from single or multiple columns and transforms remaining columns using the transformers corresponding to the groups.

Parameters:

Name Type Description Default
transformer scikit-learn compatible transformer

The transformer to be applied per group.

required
groups int | str | List[int] | List[str] | None

The column(s) of the array/dataframe to select as a grouping parameter set. If None, the transformer will be applied to the entire input without grouping.

required
use_global_model bool

Whether or not to fall back to a general transformation in case a group is not found during .transform().

True
check_X bool

Whether or not to check the input data. If False, the checks are delegated to the wrapped estimator.

True

Attributes:

Name Type Description
transformers_ scikit-learn compatible transformer | dict[..., scikit-learn compatible transformer]

The fitted transformers per group or a single fitted transformer if groups is None.

fallback_ scikit-learn compatible transformer | None

The fitted transformer to fall back to in case a group is not found during .transform(). Only present if use_global_model is True.

Source code in sklego/meta/grouped_transformer.py
class GroupedTransformer(TransformerMixin, MetaEstimatorMixin, BaseEstimator):
    """Construct a transformer per data group. Splits data by groups from single or multiple columns and transforms
    remaining columns using the transformers corresponding to the groups.

    Parameters
    ----------
    transformer : scikit-learn compatible transformer
        The transformer to be applied per group.
    groups : int | str | List[int] | List[str] | None
        The column(s) of the array/dataframe to select as a grouping parameter set. If `None`, the transformer will be
        applied to the entire input without grouping.
    use_global_model : bool, default=True
        Whether or not to fall back to a general transformation in case a group is not found during `.transform()`.
    check_X : bool, default=True
        Whether or not to check the input data. If False, the checks are delegated to the wrapped estimator.

    Attributes
    ----------
    transformers_ : scikit-learn compatible transformer | dict[..., scikit-learn compatible transformer]
        The fitted transformers per group or a single fitted transformer if `groups` is `None`.
    fallback_ : scikit-learn compatible transformer | None
        The fitted transformer to fall back to in case a group is not found during `.transform()`. Only present if
        `use_global_model` is `True`.
    """

    _check_kwargs = {"accept_large_sparse": False}
    _required_parameters = ["transformer", "groups"]

    def __init__(self, transformer, groups, use_global_model=True, check_X=True):
        self.transformer = transformer
        self.groups = groups
        self.use_global_model = use_global_model
        self.check_X = check_X

    def __fit_single_group(self, group, X, y=None):
        """Fit transformer to the given group.

        Parameters
        ----------
        group : tuple
            The group to fit the transformer to.
        X : array-like of shape (n_samples, n_features)
            Training data.
        y : array-like of shape (n_samples,), default=None
            Target values.

        Returns
        -------
        transformer : scikit-learn compatible transformer
            The fitted transformer for the group.
        """
        try:
            return clone(self.transformer).fit(X, y)
        except Exception as e:
            raise type(e)(f"Exception for group {group}: {e}")

    def __fit_grouped_transformer(self, frame: nw.DataFrame, y: Union[np.ndarray, None]):
        """Fit a transformer to each group"""

        grouped_transformers = {
            # Fit a clone of the transformer to each group
            group_name: self.__fit_single_group(
                group_name,
                X=nw.to_native(X_grp.drop(["__sklego_target__", *self.groups_])),
                y=(nw.to_native(X_grp["__sklego_target__"]) if y is not None else None),
            )
            for group_name, X_grp in frame.group_by(self.groups_)
        }

        return grouped_transformers

    def __check_transformer(self):
        """Check if the supplied transformer has a `transform` method and raise a `ValueError` if not."""
        if not hasattr(self.transformer, "transform"):
            raise ValueError("The supplied transformer should have a 'transform' method")

    def fit(self, X, y=None):
        """Fit one transformer for each group of training data `X`.

        Will also learn the groups that exist within the dataset.

        If `use_global_model=True` a fallback transformer will be fitted on the entire dataset in case a group is not
        found during `.transform()`.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data. If `groups` is not `None`, X should have at least two columns, of which at least one
            corresponds to groups defined in `groups`, and the remaining columns represent the values to transform.
        y : array-like of shape (n_samples,), default=None
            Target values.

        Returns
        -------
        self : GroupedTransformer
            The fitted transformer.
        """
        self.__check_transformer()
        self.fallback_ = None
        self.groups_ = as_list(self.groups) if self.groups is not None else []

        X = nw.from_native(X, strict=False, eager_only=True)

        if isinstance(X, nw.DataFrame):
            self.feature_names_out_ = [c for c in X.columns if c not in self.groups_]

        else:
            # Accounts for negative indices if X is an array
            self.groups_ = [
                X.shape[1] + group if isinstance(group, int) and group < 0 else group for group in self.groups_
            ]
            self.feature_names_out_ = [f"x{i}" for i in range(X.shape[1] - len(self.groups_))]

        frame = parse_X_y(X, y, self.groups_, check_X=self.check_X, **self._check_kwargs)

        if self.groups is None:
            X_, y_ = (
                nw.to_native(frame.drop("__sklego_target__")),
                nw.to_native(frame["__sklego_target__"]) if y is not None else None,
            )
            self.transformers_ = clone(self.transformer).fit(X_, y=y_)
            return self

        self.transformers_ = self.__fit_grouped_transformer(frame, y)

        if self.use_global_model:
            X_, y_ = (
                nw.to_native(frame.drop(["__sklego_target__", *self.groups_])),
                nw.to_native(frame["__sklego_target__"]) if y is not None else None,
            )
            self.fallback_ = clone(self.transformer).fit(X_, y_)

        self.n_features_in_ = X.shape[1]
        return self

    def __transform_single_group(self, group, X):
        """Transform a single group by getting its transformer from the fitted dict"""
        try:
            group_transformer = self.transformers_[group]
        except KeyError:
            if self.fallback_:
                group_transformer = self.fallback_
            else:
                raise ValueError(f"Found new group {group} during transform with use_global_model = False")

        return np.asarray(group_transformer.transform(X))

    def __transform_groups(self, frame: nw.DataFrame):
        """Transform all groups"""

        n_samples = frame.shape[0]
        frame = frame.with_columns(__sklego_index__=np.arange(n_samples))

        results = [
            (
                X_grp.select("__sklego_index__").to_numpy().squeeze().astype(int),
                self.__transform_single_group(
                    group_name, nw.to_native(X_grp.drop(["__sklego_index__", *self.groups_]))
                ),
            )
            for group_name, X_grp in frame.group_by(self.groups_)
        ]

        output = np.zeros(shape=(n_samples, results[0][1].shape[1]))
        for grp_index, grp_result in results:
            output[grp_index, :] = grp_result

        return output

    def transform(self, X):
        """Transform new data `X` by transforming on each group. If a group is not found during `.transform()` and
        `use_global_model=True` the fallback transformer will be used. If `use_global_model=False` a `ValueError` will
        be raised.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Data to transform.

        Returns
        -------
        array-like of shape (n_samples, n_features)
            Data transformed per group.
        """
        check_is_fitted(self, ["fallback_", "transformers_"])

        X = nw.from_native(X, strict=False, eager_only=True)
        frame = parse_X_y(X, y=None, groups=self.groups_, check_X=self.check_X, **self._check_kwargs).drop(
            "__sklego_target__"
        )

        if self.groups is None:
            X_ = nw.to_native(frame)
            return self.transformers_.transform(X_)

        return self.__transform_groups(frame)

    def _more_tags(self):
        return {"allow_nan": True}

    def get_feature_names_out(self) -> List[str]:
        "Alias for the `feature_names_out_` attribute defined during fit."
        return self.feature_names_out_

fit(X, y=None)

Fit one transformer for each group of training data X.

Will also learn the groups that exist within the dataset.

If use_global_model=True a fallback transformer will be fitted on the entire dataset in case a group is not found during .transform().

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Training data. If groups is not None, X should have at least two columns, of which at least one corresponds to groups defined in groups, and the remaining columns represent the values to transform.

required
y array-like of shape (n_samples,)

Target values.

None

Returns:

Name Type Description
self GroupedTransformer

The fitted transformer.

Source code in sklego/meta/grouped_transformer.py
def fit(self, X, y=None):
    """Fit one transformer for each group of training data `X`.

    Will also learn the groups that exist within the dataset.

    If `use_global_model=True` a fallback transformer will be fitted on the entire dataset in case a group is not
    found during `.transform()`.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Training data. If `groups` is not `None`, X should have at least two columns, of which at least one
        corresponds to groups defined in `groups`, and the remaining columns represent the values to transform.
    y : array-like of shape (n_samples,), default=None
        Target values.

    Returns
    -------
    self : GroupedTransformer
        The fitted transformer.
    """
    self.__check_transformer()
    self.fallback_ = None
    self.groups_ = as_list(self.groups) if self.groups is not None else []

    X = nw.from_native(X, strict=False, eager_only=True)

    if isinstance(X, nw.DataFrame):
        self.feature_names_out_ = [c for c in X.columns if c not in self.groups_]

    else:
        # Accounts for negative indices if X is an array
        self.groups_ = [
            X.shape[1] + group if isinstance(group, int) and group < 0 else group for group in self.groups_
        ]
        self.feature_names_out_ = [f"x{i}" for i in range(X.shape[1] - len(self.groups_))]

    frame = parse_X_y(X, y, self.groups_, check_X=self.check_X, **self._check_kwargs)

    if self.groups is None:
        X_, y_ = (
            nw.to_native(frame.drop("__sklego_target__")),
            nw.to_native(frame["__sklego_target__"]) if y is not None else None,
        )
        self.transformers_ = clone(self.transformer).fit(X_, y=y_)
        return self

    self.transformers_ = self.__fit_grouped_transformer(frame, y)

    if self.use_global_model:
        X_, y_ = (
            nw.to_native(frame.drop(["__sklego_target__", *self.groups_])),
            nw.to_native(frame["__sklego_target__"]) if y is not None else None,
        )
        self.fallback_ = clone(self.transformer).fit(X_, y_)

    self.n_features_in_ = X.shape[1]
    return self

get_feature_names_out()

Alias for the feature_names_out_ attribute defined during fit.

Source code in sklego/meta/grouped_transformer.py
def get_feature_names_out(self) -> List[str]:
    "Alias for the `feature_names_out_` attribute defined during fit."
    return self.feature_names_out_

transform(X)

Transform new data X by transforming on each group. If a group is not found during .transform() and use_global_model=True the fallback transformer will be used. If use_global_model=False a ValueError will be raised.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Data to transform.

required

Returns:

Type Description
array-like of shape (n_samples, n_features)

Data transformed per group.

Source code in sklego/meta/grouped_transformer.py
def transform(self, X):
    """Transform new data `X` by transforming on each group. If a group is not found during `.transform()` and
    `use_global_model=True` the fallback transformer will be used. If `use_global_model=False` a `ValueError` will
    be raised.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Data to transform.

    Returns
    -------
    array-like of shape (n_samples, n_features)
        Data transformed per group.
    """
    check_is_fitted(self, ["fallback_", "transformers_"])

    X = nw.from_native(X, strict=False, eager_only=True)
    frame = parse_X_y(X, y=None, groups=self.groups_, check_X=self.check_X, **self._check_kwargs).drop(
        "__sklego_target__"
    )

    if self.groups is None:
        X_ = nw.to_native(frame)
        return self.transformers_.transform(X_)

    return self.__transform_groups(frame)

sklego.meta.ordinal_classification.OrdinalClassifier

Bases: MultiOutputMixin, ClassifierMixin, MetaEstimatorMixin, BaseEstimator

The OrdinalClassifier allows to use a binary classifier to address an ordinal classification problem.

Suppose we have N ordinal classes to predict, then the original binary classifier is fitted on N-1 by training sets, each of which represents the samples where y <= y_label for each y_label in y except y.max() (as every sample is smaller than the maximum value).

The binary classifiers are then used to predict the probability of each sample to be in each new class y <= y_label, and finally the probability of each sample is the difference between two consecutive classes is computed:

\[ P(y = \text{class}_i) = P(\text{class}_{i-1} < y \leq \text{class}_i) = P(y \leq \text{class}_i) - P(y \leq \text{class}_{i-1}) \]

About scikit-learn predict_probas

As you can see from the formula above, it is of utmost importance to use proper probabilities to compute the results. However, not every scikit-learn classifier .predict_proba() method outputs probabilities (they are more like a confidence score).

We recommend to use CalibratedClassifierCV to calibrate the probabilities of the binary classifiers.

You can enable this by setting use_calibration=True and passing an uncalibrated classifier to the OrdinalClassifier or by passing a calibrated classifier to the OrdinalClassifier constructor.

More on this topic can be found in the scikit-learn documentation.

Computation time

The OrdinalClassifier is a meta-estimator that fits N-1 binary classifiers. This can be computationally expensive, especially when using a large number of samples and/or features or a complex classifier.

Parameters:

Name Type Description Default
estimator scikit-learn compatible classifier

The estimator to be applied to the data, used as binary classifier.

required
n_jobs int

The number of jobs to run in parallel. The same convention of joblib.Parallel holds:

  • n_jobs = None: interpreted as n_jobs=1.
  • n_jobs > 0: n_cpus=n_jobs are used.
  • n_jobs < 0: (n_cpus + 1 + n_jobs) are used.
None
use_calibration bool

Whether or not to calibrate the binary classifiers using CalibratedClassifierCV.

False
calibrarion_kwargs dict | None

Keyword arguments to the CalibratedClassifierCV class, used only if use_calibration=True.

None

Attributes:

Name Type Description
estimators_ dict[int, scikit-learn compatible classifier]

The fitted underlying binary classifiers.

classes_ np.ndarray of shape (n_classes,)

The classes seen during fit.

n_features_in_ int

The number of features seen during fit.

Examples:

import pandas as pd

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

from sklego.meta import OrdinalClassifier

url = "https://stats.idre.ucla.edu/stat/data/ologit.dta"
df = pd.read_stata(url).assign(apply_codes = lambda t: t["apply"].cat.codes)

target = "apply_codes"
features = [c for c in df.columns if c not in {target, "apply"}]

X, y = df[features].to_numpy(), df[target].to_numpy()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf = OrdinalClassifier(LogisticRegression(), n_jobs=-1)
_ = clf.fit(X_train, y_train)
clf.predict_proba(X_test)
Notes

The implementation is based on the paper A simple approach to ordinal classification by Eibe Frank and Mark Hall.

Source code in sklego/meta/ordinal_classification.py
class OrdinalClassifier(MultiOutputMixin, ClassifierMixin, MetaEstimatorMixin, BaseEstimator):
    r"""The `OrdinalClassifier` allows to use a binary classifier to address an ordinal classification problem.

    Suppose we have N ordinal classes to predict, then the original binary classifier is fitted on N-1 by training sets,
    each of which represents the samples where `y <= y_label` for each `y_label` in `y` except `y.max()` (as every
    sample is smaller than the maximum value).

    The binary classifiers are then used to predict the probability of each sample to be in each _new_ class
    `y <= y_label`, and finally the probability of each sample is the difference between two consecutive classes is
    computed:

    $$ P(y = \text{class}_i) = P(\text{class}_{i-1} < y \leq \text{class}_i) = P(y \leq \text{class}_i) - P(y \leq \text{class}_{i-1}) $$

    !!! warning "About scikit-learn `predict_proba`s"

        As you can see from the formula above, it is of utmost importance to use _proper_ probabilities to compute the
        results. However, not every scikit-learn classifier `.predict_proba()` method outputs probabilities (they are
        more like a confidence score).

        We recommend to use `CalibratedClassifierCV` to calibrate the probabilities of the binary classifiers.

        You can enable this by setting `use_calibration=True` and passing an uncalibrated classifier to the
        `OrdinalClassifier` or by passing a calibrated classifier to the `OrdinalClassifier` constructor.

        More on this topic can be found in the [scikit-learn documentation](https://scikit-learn.org/stable/modules/calibration.html).

    !!! warning "Computation time"

        The `OrdinalClassifier` is a meta-estimator that fits N-1 binary classifiers. This can be computationally
        expensive, especially when using a large number of samples and/or features or a complex classifier.

    Parameters
    ----------
    estimator : scikit-learn compatible classifier
        The estimator to be applied to the data, used as binary classifier.
    n_jobs : int, default=None
        The number of jobs to run in parallel. The same convention of [`joblib.Parallel`](https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html)
        holds:

        - `n_jobs = None`: interpreted as n_jobs=1.
        - `n_jobs > 0`: n_cpus=n_jobs are used.
        - `n_jobs < 0`: (n_cpus + 1 + n_jobs) are used.
    use_calibration : bool, default=False
        Whether or not to calibrate the binary classifiers using `CalibratedClassifierCV`.
    calibrarion_kwargs : dict | None, default=None
        Keyword arguments to the `CalibratedClassifierCV` class, used only if `use_calibration=True`.

    Attributes
    ----------
    estimators_ : dict[int, scikit-learn compatible classifier]
        The fitted underlying binary classifiers.
    classes_ : np.ndarray of shape (n_classes,)
        The classes seen during `fit`.
    n_features_in_ : int
        The number of features seen during `fit`.

    Examples
    --------
    ```py
    import pandas as pd

    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import train_test_split

    from sklego.meta import OrdinalClassifier

    url = "https://stats.idre.ucla.edu/stat/data/ologit.dta"
    df = pd.read_stata(url).assign(apply_codes = lambda t: t["apply"].cat.codes)

    target = "apply_codes"
    features = [c for c in df.columns if c not in {target, "apply"}]

    X, y = df[features].to_numpy(), df[target].to_numpy()
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    clf = OrdinalClassifier(LogisticRegression(), n_jobs=-1)
    _ = clf.fit(X_train, y_train)
    clf.predict_proba(X_test)
    ```

    Notes
    -----
    The implementation is based on the paper [A simple approach to ordinal classification](https://www.cs.waikato.ac.nz/~eibe/pubs/ordinal_tech_report.pdf)
    by Eibe Frank and Mark Hall.

    """

    is_multiclass = True

    def __init__(self, estimator, *, n_jobs=None, use_calibration=False, calibration_kwargs=None):
        self.estimator = estimator
        self.n_jobs = n_jobs
        self.use_calibration = use_calibration
        self.calibration_kwargs = calibration_kwargs

    def fit(self, X, y):
        """Fit the `OrdinalClassifier` model on training data `X` and `y` by fitting its underlying estimators on
        N-1 datasets `X` and `y` for each class `y_label` in `y` except `y.max()`.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features )
            The training data.
        y : array-like of shape (n_samples,)
            The target values.

        Returns
        -------
        self : OrdinalClassifier
            Fitted model.

        Raises
        ------
        ValueError
            If the estimator is not a classifier or if it does not implement `.predict_proba()`.
        """

        if not is_classifier(self.estimator):
            raise ValueError("The estimator must be a classifier.")

        if not hasattr(self.estimator, "predict_proba"):
            raise ValueError("The estimator must implement `.predict_proba()` method.")

        X, y = check_X_y(X, y, estimator=self, ensure_min_samples=2)

        self.classes_ = np.sort(np.unique(y))
        self.n_features_in_ = X.shape[1]

        if self.n_classes_ < 3:
            raise ValueError("`OrdinalClassifier` can't train when less than 3 classes are present.")

        if self.n_jobs is None or self.n_jobs == 1:
            self.estimators_ = {y_label: self._fit_binary_estimator(X, y, y_label) for y_label in self.classes_[:-1]}
        else:
            self.estimators_ = dict(
                zip(
                    self.classes_[:-1],
                    Parallel(n_jobs=self.n_jobs)(
                        delayed(self._fit_binary_estimator)(X, y, y_label) for y_label in self.classes_[:-1]
                    ),
                )
            )

        return self

    def predict_proba(self, X):
        """Predict class probabilities for samples in `X`. The class probabilities of a sample are computed as the
        difference between the probability of the sample to be in class `y_label` and the probability of the sample to
        be in class `y_label - 1`.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            The data to predict.

        Returns
        -------
        array-like of shape (n_samples, n_classes)
            The predicted class probabilities.

        Raises
        ------
        ValueError
            If `X` has a different number of features than the one seen during `fit`.
        """
        check_is_fitted(self, ["estimators_", "classes_"])
        X = check_array(X, ensure_2d=True, estimator=self)

        if X.shape[1] != self.n_features_in_:
            raise ValueError(f"X has {X.shape[1]} features, expected {self.n_features_in_} features.")

        raw_proba = np.array([estimator.predict_proba(X)[:, 1] for estimator in self.estimators_.values()]).T
        p_y_le = np.column_stack((np.zeros(X.shape[0]), raw_proba, np.ones(X.shape[0])))

        # Equivalent to (p_y_le[:, 1:] - p_y_le[:, :-1])
        return np.diff(p_y_le, n=1, axis=1)

    def predict(self, X):
        """Predict class labels for samples in `X` as the class with the highest probability.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            The data to predict.

        Returns
        -------
        array-like of shape (n_samples,)
            The predicted class labels.
        """
        check_is_fitted(self, ["estimators_", "classes_"])
        return self.classes_[np.argmax(self.predict_proba(X), axis=1)]

    def _fit_binary_estimator(self, X, y, y_label):
        """Utility method to fit a binary classifier on the dataset where `y <= y_label`.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features )
            The training data.
        y : array-like of shape (n_samples,)
            The target values.
        y_label : int
            The label of the class to predict.

        Returns
        -------
        fitted_model : scikit-learn compatible classifier
            The fitted binary classifier.
        """
        y_bin = (y <= y_label).astype(int)
        if self.use_calibration:
            return CalibratedClassifierCV(estimator=clone(self.estimator), **(self.calibration_kwargs or {})).fit(
                X, y_bin
            )
        else:
            return clone(self.estimator).fit(X, y_bin)

    @property
    def n_classes_(self):
        """Number of classes."""
        return len(self.classes_)

n_classes_ property

Number of classes.

fit(X, y)

Fit the OrdinalClassifier model on training data X and y by fitting its underlying estimators on N-1 datasets X and y for each class y_label in y except y.max().

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features )

The training data.

required
y array-like of shape (n_samples,)

The target values.

required

Returns:

Name Type Description
self OrdinalClassifier

Fitted model.

Raises:

Type Description
ValueError

If the estimator is not a classifier or if it does not implement .predict_proba().

Source code in sklego/meta/ordinal_classification.py
def fit(self, X, y):
    """Fit the `OrdinalClassifier` model on training data `X` and `y` by fitting its underlying estimators on
    N-1 datasets `X` and `y` for each class `y_label` in `y` except `y.max()`.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features )
        The training data.
    y : array-like of shape (n_samples,)
        The target values.

    Returns
    -------
    self : OrdinalClassifier
        Fitted model.

    Raises
    ------
    ValueError
        If the estimator is not a classifier or if it does not implement `.predict_proba()`.
    """

    if not is_classifier(self.estimator):
        raise ValueError("The estimator must be a classifier.")

    if not hasattr(self.estimator, "predict_proba"):
        raise ValueError("The estimator must implement `.predict_proba()` method.")

    X, y = check_X_y(X, y, estimator=self, ensure_min_samples=2)

    self.classes_ = np.sort(np.unique(y))
    self.n_features_in_ = X.shape[1]

    if self.n_classes_ < 3:
        raise ValueError("`OrdinalClassifier` can't train when less than 3 classes are present.")

    if self.n_jobs is None or self.n_jobs == 1:
        self.estimators_ = {y_label: self._fit_binary_estimator(X, y, y_label) for y_label in self.classes_[:-1]}
    else:
        self.estimators_ = dict(
            zip(
                self.classes_[:-1],
                Parallel(n_jobs=self.n_jobs)(
                    delayed(self._fit_binary_estimator)(X, y, y_label) for y_label in self.classes_[:-1]
                ),
            )
        )

    return self

predict(X)

Predict class labels for samples in X as the class with the highest probability.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

The data to predict.

required

Returns:

Type Description
array-like of shape (n_samples,)

The predicted class labels.

Source code in sklego/meta/ordinal_classification.py
def predict(self, X):
    """Predict class labels for samples in `X` as the class with the highest probability.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        The data to predict.

    Returns
    -------
    array-like of shape (n_samples,)
        The predicted class labels.
    """
    check_is_fitted(self, ["estimators_", "classes_"])
    return self.classes_[np.argmax(self.predict_proba(X), axis=1)]

predict_proba(X)

Predict class probabilities for samples in X. The class probabilities of a sample are computed as the difference between the probability of the sample to be in class y_label and the probability of the sample to be in class y_label - 1.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

The data to predict.

required

Returns:

Type Description
array-like of shape (n_samples, n_classes)

The predicted class probabilities.

Raises:

Type Description
ValueError

If X has a different number of features than the one seen during fit.

Source code in sklego/meta/ordinal_classification.py
def predict_proba(self, X):
    """Predict class probabilities for samples in `X`. The class probabilities of a sample are computed as the
    difference between the probability of the sample to be in class `y_label` and the probability of the sample to
    be in class `y_label - 1`.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        The data to predict.

    Returns
    -------
    array-like of shape (n_samples, n_classes)
        The predicted class probabilities.

    Raises
    ------
    ValueError
        If `X` has a different number of features than the one seen during `fit`.
    """
    check_is_fitted(self, ["estimators_", "classes_"])
    X = check_array(X, ensure_2d=True, estimator=self)

    if X.shape[1] != self.n_features_in_:
        raise ValueError(f"X has {X.shape[1]} features, expected {self.n_features_in_} features.")

    raw_proba = np.array([estimator.predict_proba(X)[:, 1] for estimator in self.estimators_.values()]).T
    p_y_le = np.column_stack((np.zeros(X.shape[0]), raw_proba, np.ones(X.shape[0])))

    # Equivalent to (p_y_le[:, 1:] - p_y_le[:, :-1])
    return np.diff(p_y_le, n=1, axis=1)

sklego.meta.outlier_classifier.OutlierClassifier

Bases: BaseEstimator, ClassifierMixin, MetaEstimatorMixin

Morphs an outlier detection model into a classifier.

When an outlier is detected it will output 1 and 0 otherwise. This way you can use familiar metrics again and this allows you to consider outlier models as a fraud detector.

Parameters:

Name Type Description Default
model scikit-learn compatible outlier detection model

An outlier detection model that will be used for prediction.

required

Attributes:

Name Type Description
estimator_ scikit-learn compatible outlier detection model

The fitted underlying outlier detection model.

classes_ array-like of shape (2,)

Classes used for prediction (0 or 1)

Example
from sklearn.ensemble import IsolationForest
from sklego.meta.outlier_classifier import OutlierClassifier

X = [[0], [0.5], [-1], [99]]
y = [0, 0, 0, 1]

isolation_forest = IsolationForest()

outlier_clf = OutlierClassifier(isolation_forest)
_ = outlier_clf.fit(X, y)

preds = outlier_clf.predict([[100], [-0.5], [0.5], [1]])
# array[1. 0. 0. 0.]

proba_preds = outlier_clf.predict_proba([[100], [-0.5], [0.5], [1]])
# [[0.34946567 0.65053433]
#  [0.79707913 0.20292087]
#  [0.80275406 0.19724594]
#  [0.80275406 0.19724594]]
Source code in sklego/meta/outlier_classifier.py
class OutlierClassifier(BaseEstimator, ClassifierMixin, MetaEstimatorMixin):
    """Morphs an outlier detection model into a classifier.

    When an outlier is detected it will output 1 and 0 otherwise. This way you can use familiar metrics again and this
    allows you to consider outlier models as a fraud detector.

    Parameters
    ----------
    model : scikit-learn compatible outlier detection model
        An outlier detection model that will be used for prediction.

    Attributes
    ----------
    estimator_ : scikit-learn compatible outlier detection model
        The fitted underlying outlier detection model.
    classes_ : array-like of shape (2,)
        Classes used for prediction (0 or 1)

    Example
    -------
    ```py
    from sklearn.ensemble import IsolationForest
    from sklego.meta.outlier_classifier import OutlierClassifier

    X = [[0], [0.5], [-1], [99]]
    y = [0, 0, 0, 1]

    isolation_forest = IsolationForest()

    outlier_clf = OutlierClassifier(isolation_forest)
    _ = outlier_clf.fit(X, y)

    preds = outlier_clf.predict([[100], [-0.5], [0.5], [1]])
    # array[1. 0. 0. 0.]

    proba_preds = outlier_clf.predict_proba([[100], [-0.5], [0.5], [1]])
    # [[0.34946567 0.65053433]
    #  [0.79707913 0.20292087]
    #  [0.80275406 0.19724594]
    #  [0.80275406 0.19724594]]
    ```
    """

    _required_parameters = ["model"]

    def __init__(self, model):
        self.model = model

    def _is_outlier_model(self):
        """Check if the underlying model is an outlier detection model."""
        return isinstance(self.model, OutlierModel)

    def fit(self, X, y=None):
        """Fit the underlying estimator to the training data `X` and `y`.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data.
        y : array-like of shape (n_samples,), default=None
            Target values.

        Returns
        -------
        self : OutlierClassifier
            The fitted estimator.

        Raises
        ------
        ValueError
            - If the underlying model is not an outlier detection model.
            - If the underlying model does not have a `decision_function` method.
        """
        if not self._is_outlier_model():
            raise ValueError("Passed model does not detect outliers!")
        if not hasattr(self.model, "decision_function"):
            raise ValueError(
                f"Passed model {self.model} does not have a `decision_function` "
                f"method. This is required for `predict_proba` estimation."
            )
        X, y = check_X_y(X, y)
        self.estimator_ = clone(self.model).fit(X, y)
        self.n_features_in_ = self.estimator_.n_features_in_
        self.classes_ = np.array([0, 1])

        # fit sigmoid function for `predict_proba`
        decision_function_scores = self.estimator_.decision_function(X)
        self._predict_proba_sigmoid = _SigmoidCalibration().fit(decision_function_scores, y)

        return self

    def predict(self, X):
        """Predict new data.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            The data to predict.

        Returns
        -------
        np.ndarray of shape (n_samples,)
            The predicted values. 0 for inliers, 1 for outliers.
        """
        check_is_fitted(self, ["estimator_", "classes_"])
        preds = self.estimator_.predict(X)
        result = (preds == -1).astype(int)
        return result

    def predict_proba(self, X):
        """Predict probability estimates for new data.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            The data to predict.

        Returns
        -------
        np.ndarray of shape (n_samples,)
            The predicted probabilities.
        """
        check_is_fitted(self, ["estimator_", "classes_"])
        decision_function_scores = self.estimator_.decision_function(X)
        probabilities = self._predict_proba_sigmoid.predict(decision_function_scores).reshape(-1, 1)
        complement = np.ones_like(probabilities) - probabilities
        return np.hstack((complement, probabilities))

fit(X, y=None)

Fit the underlying estimator to the training data X and y.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Training data.

required
y array-like of shape (n_samples,)

Target values.

None

Returns:

Name Type Description
self OutlierClassifier

The fitted estimator.

Raises:

Type Description
ValueError
  • If the underlying model is not an outlier detection model.
  • If the underlying model does not have a decision_function method.
Source code in sklego/meta/outlier_classifier.py
def fit(self, X, y=None):
    """Fit the underlying estimator to the training data `X` and `y`.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Training data.
    y : array-like of shape (n_samples,), default=None
        Target values.

    Returns
    -------
    self : OutlierClassifier
        The fitted estimator.

    Raises
    ------
    ValueError
        - If the underlying model is not an outlier detection model.
        - If the underlying model does not have a `decision_function` method.
    """
    if not self._is_outlier_model():
        raise ValueError("Passed model does not detect outliers!")
    if not hasattr(self.model, "decision_function"):
        raise ValueError(
            f"Passed model {self.model} does not have a `decision_function` "
            f"method. This is required for `predict_proba` estimation."
        )
    X, y = check_X_y(X, y)
    self.estimator_ = clone(self.model).fit(X, y)
    self.n_features_in_ = self.estimator_.n_features_in_
    self.classes_ = np.array([0, 1])

    # fit sigmoid function for `predict_proba`
    decision_function_scores = self.estimator_.decision_function(X)
    self._predict_proba_sigmoid = _SigmoidCalibration().fit(decision_function_scores, y)

    return self

predict(X)

Predict new data.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

The data to predict.

required

Returns:

Type Description
np.ndarray of shape (n_samples,)

The predicted values. 0 for inliers, 1 for outliers.

Source code in sklego/meta/outlier_classifier.py
def predict(self, X):
    """Predict new data.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        The data to predict.

    Returns
    -------
    np.ndarray of shape (n_samples,)
        The predicted values. 0 for inliers, 1 for outliers.
    """
    check_is_fitted(self, ["estimator_", "classes_"])
    preds = self.estimator_.predict(X)
    result = (preds == -1).astype(int)
    return result

predict_proba(X)

Predict probability estimates for new data.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

The data to predict.

required

Returns:

Type Description
np.ndarray of shape (n_samples,)

The predicted probabilities.

Source code in sklego/meta/outlier_classifier.py
def predict_proba(self, X):
    """Predict probability estimates for new data.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        The data to predict.

    Returns
    -------
    np.ndarray of shape (n_samples,)
        The predicted probabilities.
    """
    check_is_fitted(self, ["estimator_", "classes_"])
    decision_function_scores = self.estimator_.decision_function(X)
    probabilities = self._predict_proba_sigmoid.predict(decision_function_scores).reshape(-1, 1)
    complement = np.ones_like(probabilities) - probabilities
    return np.hstack((complement, probabilities))

sklego.meta.regression_outlier_detector.RegressionOutlierDetector

Bases: BaseEstimator, OutlierMixin

Morphs a regression estimator into one that can detect outliers. We will try to predict column in X.

Parameters:

Name Type Description Default
model scikit-learn compatible regression model

A regression model that will be used for prediction.

required
column int | str

This should be:

- The index of the target column to predict in the input data, when the input is an array.
- The name of the target column to predict in the input data, when the input is a dataframe.
required
lower float

Lower threshold for outlier detection. The method used for detection depends on the method parameter.

2.0
upper float

Upper threshold for outlier detection. The method used for detection depends on the method parameter.

2.0
method Literal[sd, relative, absolute]

The method to use for outlier detection.

  • "sd" uses standard deviation
  • "relative" uses relative difference
  • "absolute" uses absolute difference
"sd"

Attributes:

Name Type Description
estimator_ scikit-learn compatible regression model

The fitted underlying regression model.

sd_ float

The standard deviation of the differences between true and predicted values.

idx_ int

The index of the target column in the input data.

Notes

Native cross-dataframe support is achieved using Narwhals. Supported dataframes are:

  • pandas
  • Polars (eager)
  • Modin

See Narwhals docs for an up-to-date list (and to learn how you can add your dataframe library to it!), though note that only those supported by sklearn.utils.check_X_y will work with this class.

Source code in sklego/meta/regression_outlier_detector.py
class RegressionOutlierDetector(BaseEstimator, OutlierMixin):
    """Morphs a regression estimator into one that can detect outliers. We will try to predict `column` in X.

    Parameters
    ----------
    model : scikit-learn compatible regression model
        A regression model that will be used for prediction.
    column : int | str
        This should be:

            - The index of the target column to predict in the input data, when the input is an array.
            - The name of the target column to predict in the input data, when the input is a dataframe.
    lower : float, default=2.0
        Lower threshold for outlier detection. The method used for detection depends on the `method` parameter.
    upper : float, default=2.0
        Upper threshold for outlier detection. The method used for detection depends on the `method` parameter.
    method : Literal["sd", "relative", "absolute"], default="sd"
        The method to use for outlier detection.

        - `"sd"` uses standard deviation
        - `"relative"` uses relative difference
        - `"absolute"` uses absolute difference

    Attributes
    ----------
    estimator_ : scikit-learn compatible regression model
        The fitted underlying regression model.
    sd_ : float
        The standard deviation of the differences between true and predicted values.
    idx_ : int
        The index of the target column in the input data.

    Notes
    -----
    Native cross-dataframe support is achieved using
    [Narwhals](https://narwhals-dev.github.io/narwhals/){:target="_blank"}.
    Supported dataframes are:

    - pandas
    - Polars (eager)
    - Modin

    See [Narwhals docs](https://narwhals-dev.github.io/narwhals/extending/){:target="_blank"} for an up-to-date list
    (and to learn how you can add your dataframe library to it!), though note that only those
    supported by [sklearn.utils.check_X_y](https://scikit-learn.org/stable/modules/generated/sklearn.utils.check_X_y.html)
    will work with this class.
    """

    _required_parameters = ["model", "column"]

    def __init__(self, model, column, lower=2, upper=2, method="sd"):
        self.model = model
        self.column = column
        self.lower = lower
        self.upper = upper
        self.method = method

    def _is_regression_model(self):
        """Check if the underlying model is a regression model."""
        return any(["RegressorMixin" in p.__name__ for p in type(self.model).__bases__])

    def _handle_thresholds(self, y_true, y_pred):
        """Compute if a sample is an outlier based on the `method` parameter."""
        difference = y_true - y_pred
        results = np.ones(difference.shape, dtype=int)
        allowed_methods = ["sd", "relative", "absolute"]
        if self.method not in allowed_methods:
            ValueError(f"`method` must be in {allowed_methods} got: {self.method}")
        if self.method == "sd":
            lower_limit_hit = -self.lower * self.sd_ > difference
            upper_limit_hit = self.upper * self.sd_ < difference
        if self.method == "relative":
            lower_limit_hit = -self.lower > difference / y_true
            upper_limit_hit = self.upper < difference / y_true
        if self.method == "absolute":
            lower_limit_hit = -self.lower > difference
            upper_limit_hit = self.upper < difference
        results[lower_limit_hit] = -1
        results[upper_limit_hit] = -1
        return results

    def to_x_y(self, X):
        """Split `X` into two arrays `X_to_use` and `y`.
        `y` is the column we want to predict (specified in the `column` parameter) and `X_to_use` is the rest of the
        data.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Data to split.

        Returns
        -------
        X_to_use : array-like of shape (n_samples, n_features-1)
            Data to use for prediction.
        y : array-like of shape (n_samples,)
            The target column.
        """
        y = X[:, self.idx_]
        cols_to_use = [i for i in range(X.shape[1]) if i != self.column]
        X_to_use = X[:, cols_to_use]
        if len(X_to_use.shape) == 1:
            X_to_use = X_to_use.reshape(-1, 1)
        return X_to_use, y

    def fit(self, X, y=None):
        """Fit the underlying model on `X_to_use` and `y` where:

        - `y` is the column we want to predict (`X[:, self.column]`)
        - `X_to_use` is the rest of the data.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data.
        y : array-like of shape (n_samples,), default=None
            Ignored, present for compatibility.

        Returns
        -------
        self : RegressionOutlierDetector
            The fitted estimator.

        Raises
        ------
        ValueError
            If the `model` is not a regression estimator.
        """
        X = nw.from_native(X, eager_only=True, strict=False)
        self.idx_ = np.argmax([i == self.column for i in X.columns]) if isinstance(X, nw.DataFrame) else self.column
        X = check_array(nw.to_native(X, strict=False), estimator=self)

        self.n_features_in_ = X.shape[1]

        if not self._is_regression_model():
            raise ValueError("Passed model must be regression!")
        X, y = self.to_x_y(X)
        self.estimator_ = clone(self.model).fit(X, y)
        self.sd_ = np.std(self.estimator_.predict(X) - y)
        self.offset_ = 0

        return self

    def predict(self, X, y=None):
        """Predict which samples of `X` are outliers using the underlying estimator and given `method`.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            The data to predict.
        y : array-like of shape (n_samples,), default=None
            Ignored, present for compatibility.

        Returns
        -------
        np.ndarray of shape (n_samples,)
            The predicted values. 1 for inliers, -1 for outliers.
        """
        check_is_fitted(self, ["estimator_", "sd_", "idx_"])
        X = check_array(X, estimator=self)
        X, y = self.to_x_y(X)
        preds = self.estimator_.predict(X)
        return self._handle_thresholds(y, preds)

    def score_samples(self, X, y=None):
        """Calculate the outlier scores for the given data.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Data for which outlier scores are calculated.
        y : array-like of shape shape=(n_samples,), default=None
            Ignored, present for compatibility.

        Returns
        -------
        np.ndarray of shape (n_samples,)
            The outlier scores for the input data.

        Raises
        ------
        ValueError
            If `method` is not one of "sd", "relative", or "absolute".
        """
        check_is_fitted(self, ["estimator_", "sd_", "idx_"])
        X = check_array(X, estimator=self)
        X, y_true = self.to_x_y(X)
        y_pred = self.estimator_.predict(X)
        difference = y_true - y_pred
        allowed_methods = ["sd", "relative", "absolute"]
        if self.method not in allowed_methods:
            ValueError(f"`method` must be in {allowed_methods} got: {self.method}")
        if self.method == "sd":
            return difference
        if self.method == "relative":
            return difference / y_true
        if self.method == "absolute":
            return difference

    def decision_function(self, X):
        return self.score_samples(X) - self.offset_

fit(X, y=None)

Fit the underlying model on X_to_use and y where:

  • y is the column we want to predict (X[:, self.column])
  • X_to_use is the rest of the data.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Training data.

required
y array-like of shape (n_samples,)

Ignored, present for compatibility.

None

Returns:

Name Type Description
self RegressionOutlierDetector

The fitted estimator.

Raises:

Type Description
ValueError

If the model is not a regression estimator.

Source code in sklego/meta/regression_outlier_detector.py
def fit(self, X, y=None):
    """Fit the underlying model on `X_to_use` and `y` where:

    - `y` is the column we want to predict (`X[:, self.column]`)
    - `X_to_use` is the rest of the data.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Training data.
    y : array-like of shape (n_samples,), default=None
        Ignored, present for compatibility.

    Returns
    -------
    self : RegressionOutlierDetector
        The fitted estimator.

    Raises
    ------
    ValueError
        If the `model` is not a regression estimator.
    """
    X = nw.from_native(X, eager_only=True, strict=False)
    self.idx_ = np.argmax([i == self.column for i in X.columns]) if isinstance(X, nw.DataFrame) else self.column
    X = check_array(nw.to_native(X, strict=False), estimator=self)

    self.n_features_in_ = X.shape[1]

    if not self._is_regression_model():
        raise ValueError("Passed model must be regression!")
    X, y = self.to_x_y(X)
    self.estimator_ = clone(self.model).fit(X, y)
    self.sd_ = np.std(self.estimator_.predict(X) - y)
    self.offset_ = 0

    return self

predict(X, y=None)

Predict which samples of X are outliers using the underlying estimator and given method.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

The data to predict.

required
y array-like of shape (n_samples,)

Ignored, present for compatibility.

None

Returns:

Type Description
np.ndarray of shape (n_samples,)

The predicted values. 1 for inliers, -1 for outliers.

Source code in sklego/meta/regression_outlier_detector.py
def predict(self, X, y=None):
    """Predict which samples of `X` are outliers using the underlying estimator and given `method`.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        The data to predict.
    y : array-like of shape (n_samples,), default=None
        Ignored, present for compatibility.

    Returns
    -------
    np.ndarray of shape (n_samples,)
        The predicted values. 1 for inliers, -1 for outliers.
    """
    check_is_fitted(self, ["estimator_", "sd_", "idx_"])
    X = check_array(X, estimator=self)
    X, y = self.to_x_y(X)
    preds = self.estimator_.predict(X)
    return self._handle_thresholds(y, preds)

score_samples(X, y=None)

Calculate the outlier scores for the given data.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Data for which outlier scores are calculated.

required
y array-like of shape shape=(n_samples,)

Ignored, present for compatibility.

None

Returns:

Type Description
np.ndarray of shape (n_samples,)

The outlier scores for the input data.

Raises:

Type Description
ValueError

If method is not one of "sd", "relative", or "absolute".

Source code in sklego/meta/regression_outlier_detector.py
def score_samples(self, X, y=None):
    """Calculate the outlier scores for the given data.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Data for which outlier scores are calculated.
    y : array-like of shape shape=(n_samples,), default=None
        Ignored, present for compatibility.

    Returns
    -------
    np.ndarray of shape (n_samples,)
        The outlier scores for the input data.

    Raises
    ------
    ValueError
        If `method` is not one of "sd", "relative", or "absolute".
    """
    check_is_fitted(self, ["estimator_", "sd_", "idx_"])
    X = check_array(X, estimator=self)
    X, y_true = self.to_x_y(X)
    y_pred = self.estimator_.predict(X)
    difference = y_true - y_pred
    allowed_methods = ["sd", "relative", "absolute"]
    if self.method not in allowed_methods:
        ValueError(f"`method` must be in {allowed_methods} got: {self.method}")
    if self.method == "sd":
        return difference
    if self.method == "relative":
        return difference / y_true
    if self.method == "absolute":
        return difference

to_x_y(X)

Split X into two arrays X_to_use and y. y is the column we want to predict (specified in the column parameter) and X_to_use is the rest of the data.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Data to split.

required

Returns:

Name Type Description
X_to_use array-like of shape (n_samples, n_features-1)

Data to use for prediction.

y array-like of shape (n_samples,)

The target column.

Source code in sklego/meta/regression_outlier_detector.py
def to_x_y(self, X):
    """Split `X` into two arrays `X_to_use` and `y`.
    `y` is the column we want to predict (specified in the `column` parameter) and `X_to_use` is the rest of the
    data.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Data to split.

    Returns
    -------
    X_to_use : array-like of shape (n_samples, n_features-1)
        Data to use for prediction.
    y : array-like of shape (n_samples,)
        The target column.
    """
    y = X[:, self.idx_]
    cols_to_use = [i for i in range(X.shape[1]) if i != self.column]
    X_to_use = X[:, cols_to_use]
    if len(X_to_use.shape) == 1:
        X_to_use = X_to_use.reshape(-1, 1)
    return X_to_use, y

sklego.meta.subjective_classifier.SubjectiveClassifier

Bases: BaseEstimator, ClassifierMixin, MetaEstimatorMixin

Corrects predictions of the inner classifier by taking into account a (subjective) prior distribution of the classes.

This can be useful when there is a difference in class distribution between the training data set and the real world. Using the confusion matrix of the inner classifier and the prior, the posterior probability for a class, given the prediction of the inner classifier, can be computed.

The background for this posterior estimation is given in this article.

Based on the evidence attribute, this meta estimator's predictions are based on simple weighing of the inner estimator's predict_proba() results, the posterior probabilities based on the confusion matrix, or a combination of the two approaches.

Parameters:

Name Type Description Default
estimator scikit-learn compatible classifier

Classifier that will be wrapped with SubjectiveClassifier. It should implement predict_proba method.

required
prior dict[int, float]

A dictionary mapping class -> frequency representing the prior (a.k.a. subjective real-world) class distribution. The class frequencies should sum to 1.

required
evidence Literal[predict_proba, confusion_matrix, both]

A string indicating which evidence should be used to correct the inner estimator's predictions.

  • If "both" the the inner estimator's predict_proba() results are multiplied by the posterior probabilities.
  • If "predict_proba", the inner estimator's predict_proba() results are multiplied by the prior distribution.
  • If "confusion_matrix", the inner estimator's discrete predictions are converted to posterior probabilities using the prior and the inner estimator's confusion matrix (obtained from the train data used in fit()).
"both"

Attributes:

Name Type Description
estimator_ scikit-learn compatible classifier

The fitted classifier.

classes_ array-like, shape=(n_classes,)

The classes labels.

posterior_matrix_ array-like, shape=(n_classes, n_classes)

The posterior probabilities for each class, given the prediction of the inner classifier.

Source code in sklego/meta/subjective_classifier.py
class SubjectiveClassifier(BaseEstimator, ClassifierMixin, MetaEstimatorMixin):
    """Corrects predictions of the inner classifier by taking into account a (subjective) prior distribution of the
    classes.

    This can be useful when there is a difference in class distribution between the training data set and the real
    world. Using the confusion matrix of the inner classifier and the prior, the posterior probability for a class,
    given the prediction of the inner classifier, can be computed.

    The background for this posterior estimation is given in
    [this article](https://lucdemortier.github.io/articles/16/PerformanceMetrics).

    Based on the `evidence` attribute, this meta estimator's predictions are based on simple weighing of the inner
    estimator's `predict_proba()` results, the posterior probabilities based on the confusion matrix, or a combination
    of the two approaches.

    Parameters
    ----------
    estimator : scikit-learn compatible classifier
        Classifier that will be wrapped with SubjectiveClassifier. It should implement `predict_proba` method.
    prior : dict[int, float]
        A dictionary mapping `class -> frequency` representing the prior (a.k.a. subjective real-world) class
        distribution. The class frequencies should sum to 1.
    evidence : Literal["predict_proba", "confusion_matrix", "both"], default="both"
        A string indicating which evidence should be used to correct the inner estimator's predictions.

        - If `"both"` the the inner estimator's `predict_proba()` results are multiplied by the posterior probabilities.
        - If `"predict_proba"`, the inner estimator's `predict_proba()` results are multiplied by the prior
            distribution.
        - If `"confusion_matrix"`, the inner estimator's discrete predictions are converted to posterior probabilities
            using the prior and the inner estimator's confusion matrix (obtained from the train data used in `fit()`).

    Attributes
    ----------
    estimator_ : scikit-learn compatible classifier
        The fitted classifier.
    classes_ : array-like, shape=(n_classes,)
        The classes labels.
    posterior_matrix_ : array-like, shape=(n_classes, n_classes)
        The posterior probabilities for each class, given the prediction of the inner classifier.
    """

    _ALLOWED_EVIDENCE = ("predict_proba", "confusion_matrix", "both")
    _required_parameters = ["estimator", "prior"]

    def __init__(self, estimator, prior, evidence="both"):
        self.estimator = estimator
        self.prior = prior
        self.evidence = evidence

    def _likelihood(self, predicted_class, given_class, cfm):
        return cfm[given_class, predicted_class] / cfm[given_class, :].sum()

    def _evidence(self, predicted_class, cfm):
        return sum(
            [
                self._likelihood(predicted_class, given_class, cfm) * self.prior[self.classes_[given_class]]
                for given_class in range(cfm.shape[0])
            ]
        )

    def _posterior(self, y, y_hat, cfm):
        y_hat_evidence = self._evidence(y_hat, cfm)
        return (
            (self._likelihood(y_hat, y, cfm) * self.prior[self.classes_[y]] / y_hat_evidence)
            if y_hat_evidence > 0
            else self.prior[y]  # in case confusion matrix has all-zero column for y_hat
        )

    def fit(self, X, y):
        """Fit the inner classfier using `X` and `y` as training data by fitting the underlying estimator and computing
        the posterior probabilities.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features )
            The training data.
        y : array-like of shape (n_samples,)
            The target values.

        Returns
        -------
        self : SubjectiveClassifier
            The fitted estimator.

        Raises
        ------
        ValueError
            - If `estimator` is not a classifier.
            - If `y` contains classes that are not specified in the `prior`
            - If `prior` is not a valid probability distribution (i.e. does not sum to 1).
            - If `evidence` is not one of "predict_proba", "confusion_matrix", or "both".
        """
        if not isinstance(self.estimator, ClassifierMixin):
            raise ValueError(
                "Invalid inner estimator: the SubjectiveClassifier meta model only works on classification models"
            )

        if not np.isclose(sum(self.prior.values()), 1):
            raise ValueError("Invalid prior: the prior probabilities of all classes should sum to 1")

        if self.evidence not in self._ALLOWED_EVIDENCE:
            raise ValueError(f"Invalid evidence: the provided evidence should be one of {self._ALLOWED_EVIDENCE}")

        X, y = check_X_y(X, y, estimator=self.estimator, dtype=FLOAT_DTYPES)
        if set(y) - set(self.prior.keys()):
            raise ValueError(
                f"Training data is inconsistent with prior: no prior defined for classes "
                f"{set(y) - set(self.prior.keys())}"
            )
        self.estimator_ = clone(self.estimator).fit(X, y)
        cfm = confusion_matrix(y, self.estimator_.predict(X))
        self.posterior_matrix_ = np.array(
            [[self._posterior(y, y_hat, cfm) for y_hat in range(cfm.shape[0])] for y in range(cfm.shape[0])]
        )
        self.n_features_in_ = X.shape[1]
        return self

    @staticmethod
    def _weighted_proba(weights, y_hat_probas):
        return normalize(weights * y_hat_probas, norm="l1")

    @staticmethod
    def _to_discrete(y_hat_probas):
        y_hat_discrete = np.zeros(y_hat_probas.shape)
        y_hat_discrete[np.arange(y_hat_probas.shape[0]), y_hat_probas.argmax(axis=1)] = 1
        return y_hat_discrete

    def predict_proba(self, X):
        """Predict probability distribution of the class, based on the provided data.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            The data to predict.

        Returns
        -------
        array-like of shape (n_samples, n_classes)
            The predicted probabilities.
        """
        check_is_fitted(self, ["posterior_matrix_"])
        X = check_array(X, estimator=self, dtype=FLOAT_DTYPES)
        y_hats = self.estimator_.predict_proba(X)  # these are ignorant of the prior

        if self.evidence == "predict_proba":
            prior_weights = np.array([self.prior[klass] for klass in self.classes_])
            return self._weighted_proba(prior_weights, y_hats)
        else:
            posterior_probas = self._to_discrete(y_hats) @ self.posterior_matrix_.T
            return self._weighted_proba(posterior_probas, y_hats) if self.evidence == "both" else posterior_probas

    def predict(self, X):
        """Predict target values for `X` using fitted estimator.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            The data to predict.

        Returns
        -------
        array-like of shape (n_samples, )
            The predicted class.
        """
        check_is_fitted(self, ["posterior_matrix_"])
        X = check_array(X, estimator=self, dtype=FLOAT_DTYPES)
        return self.classes_[self.predict_proba(X).argmax(axis=1)]

    @property
    def classes_(self):
        """Alias for `.classes_` attribute of the underlying estimator."""
        return self.estimator_.classes_

classes_ property

Alias for .classes_ attribute of the underlying estimator.

fit(X, y)

Fit the inner classfier using X and y as training data by fitting the underlying estimator and computing the posterior probabilities.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features )

The training data.

required
y array-like of shape (n_samples,)

The target values.

required

Returns:

Name Type Description
self SubjectiveClassifier

The fitted estimator.

Raises:

Type Description
ValueError
  • If estimator is not a classifier.
  • If y contains classes that are not specified in the prior
  • If prior is not a valid probability distribution (i.e. does not sum to 1).
  • If evidence is not one of "predict_proba", "confusion_matrix", or "both".
Source code in sklego/meta/subjective_classifier.py
def fit(self, X, y):
    """Fit the inner classfier using `X` and `y` as training data by fitting the underlying estimator and computing
    the posterior probabilities.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features )
        The training data.
    y : array-like of shape (n_samples,)
        The target values.

    Returns
    -------
    self : SubjectiveClassifier
        The fitted estimator.

    Raises
    ------
    ValueError
        - If `estimator` is not a classifier.
        - If `y` contains classes that are not specified in the `prior`
        - If `prior` is not a valid probability distribution (i.e. does not sum to 1).
        - If `evidence` is not one of "predict_proba", "confusion_matrix", or "both".
    """
    if not isinstance(self.estimator, ClassifierMixin):
        raise ValueError(
            "Invalid inner estimator: the SubjectiveClassifier meta model only works on classification models"
        )

    if not np.isclose(sum(self.prior.values()), 1):
        raise ValueError("Invalid prior: the prior probabilities of all classes should sum to 1")

    if self.evidence not in self._ALLOWED_EVIDENCE:
        raise ValueError(f"Invalid evidence: the provided evidence should be one of {self._ALLOWED_EVIDENCE}")

    X, y = check_X_y(X, y, estimator=self.estimator, dtype=FLOAT_DTYPES)
    if set(y) - set(self.prior.keys()):
        raise ValueError(
            f"Training data is inconsistent with prior: no prior defined for classes "
            f"{set(y) - set(self.prior.keys())}"
        )
    self.estimator_ = clone(self.estimator).fit(X, y)
    cfm = confusion_matrix(y, self.estimator_.predict(X))
    self.posterior_matrix_ = np.array(
        [[self._posterior(y, y_hat, cfm) for y_hat in range(cfm.shape[0])] for y in range(cfm.shape[0])]
    )
    self.n_features_in_ = X.shape[1]
    return self

predict(X)

Predict target values for X using fitted estimator.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

The data to predict.

required

Returns:

Type Description
array-like of shape (n_samples, )

The predicted class.

Source code in sklego/meta/subjective_classifier.py
def predict(self, X):
    """Predict target values for `X` using fitted estimator.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        The data to predict.

    Returns
    -------
    array-like of shape (n_samples, )
        The predicted class.
    """
    check_is_fitted(self, ["posterior_matrix_"])
    X = check_array(X, estimator=self, dtype=FLOAT_DTYPES)
    return self.classes_[self.predict_proba(X).argmax(axis=1)]

predict_proba(X)

Predict probability distribution of the class, based on the provided data.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

The data to predict.

required

Returns:

Type Description
array-like of shape (n_samples, n_classes)

The predicted probabilities.

Source code in sklego/meta/subjective_classifier.py
def predict_proba(self, X):
    """Predict probability distribution of the class, based on the provided data.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        The data to predict.

    Returns
    -------
    array-like of shape (n_samples, n_classes)
        The predicted probabilities.
    """
    check_is_fitted(self, ["posterior_matrix_"])
    X = check_array(X, estimator=self, dtype=FLOAT_DTYPES)
    y_hats = self.estimator_.predict_proba(X)  # these are ignorant of the prior

    if self.evidence == "predict_proba":
        prior_weights = np.array([self.prior[klass] for klass in self.classes_])
        return self._weighted_proba(prior_weights, y_hats)
    else:
        posterior_probas = self._to_discrete(y_hats) @ self.posterior_matrix_.T
        return self._weighted_proba(posterior_probas, y_hats) if self.evidence == "both" else posterior_probas

sklego.meta.thresholder.Thresholder

Bases: BaseEstimator, ClassifierMixin

Takes a binary classifier and moves the threshold. This way you might design the algorithm to only accept a certain class if the probability for it is larger than, say, 90% instead of 50%.

Info

Please note that this only works for binary classification problems.

Parameters:

Name Type Description Default
model scikit-learn compatible classifier

Classifier that will be wrapped with Thresholder. It should implement predict_proba method.

required
threshold float

The threshold value to use.

required
refit bool
  • If True, we will always retrain the model even if it is already fitted.
  • If False we only refit if the original model isn't fitted.
False
check_input bool

Whether or not to check the input data. If False, the checks are delegated to the wrapped estimator.

False

Attributes:

Name Type Description
estimator_ scikit-learn compatible classifier

The fitted classifier.

classes_ array-like, shape=(2,)

The classes labels.

Source code in sklego/meta/thresholder.py
class Thresholder(BaseEstimator, ClassifierMixin):
    """Takes a binary classifier and moves the threshold. This way you might design the algorithm to only accept a
    certain class if the probability for it is larger than, say, 90% instead of 50%.

    !!! info
        Please note that this only works for binary classification problems.

    Parameters
    ----------
    model : scikit-learn compatible classifier
        Classifier that will be wrapped with Thresholder. It should implement `predict_proba` method.
    threshold : float
        The threshold value to use.
    refit : bool, default=False

        - If True, we will always retrain the model even if it is already fitted.
        - If False we only refit if the original model isn't fitted.
    check_input : bool, default=False
        Whether or not to check the input data. If False, the checks are delegated to the wrapped estimator.

    Attributes
    ----------
    estimator_ : scikit-learn compatible classifier
        The fitted classifier.
    classes_ : array-like, shape=(2,)
        The classes labels.
    """

    _required_parameters = ["model", "threshold"]

    def __init__(self, model, threshold: float, refit=False, check_input=False):
        self.model = model
        self.threshold = threshold
        self.refit = refit
        self.check_input = check_input

    def _handle_unfitted(self, X, y, sample_weight):
        sample_weight_ = _check_sample_weight(sample_weight, X)

        self.estimator_ = clone(self.model)
        if "sample_weight" in signature(self.estimator_.fit).parameters:
            self.estimator_.fit(X, y, sample_weight=sample_weight_)
        else:
            if sample_weight is not None:
                logging.warning("Estimator ignores sample_weight.")
            self.estimator_.fit(X, y)
        return self

    def _handle_refit(self, X, y, sample_weight=None):
        """Only refit when we need to, unless `refit=True` is present."""
        if self.refit:
            self._handle_unfitted(X, y, sample_weight)
        else:
            try:
                check_is_fitted(self.estimator_)
            except NotFittedError:
                self._handle_unfitted(X, y, sample_weight)

    def fit(self, X, y, sample_weight=None):
        """Fit the underlying estimator using `X` and `y` as training data. If `refit=True` we will always retrain
        (a copy of) the estimator.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features )
            The training data.
        y : array-like of shape (n_samples,)
            The target values.
        sample_weight : array-like of shape (n_samples, ), default=None
            Individual weights for each sample.

        Returns
        -------
        self : Thresholder
            The fitted estimator.

        Raises
        ------
        ValueError
            - If `model` is not a classifier or it does not implement `predict_proba` method.
            - If `model` does not have two classes.
        """
        self.estimator_ = self.model
        if not isinstance(self.estimator_, ProbabilisticClassifier):
            raise ValueError("The Thresholder meta model only works on classification models with .predict_proba.")

        if self.check_input:
            X, y = check_X_y(X, y, force_all_finite=False, ensure_min_features=0, estimator=self)

        self._handle_refit(X, y, sample_weight)

        self.n_features_in_ = X.shape[1]
        self.classes_ = self.estimator_.classes_
        if len(self.classes_) != 2:
            raise ValueError("The `Thresholder` meta model only works on models with two classes.")

        return self

    def predict(self, X):
        """Predict target values for `X` using fitted estimator and the given `threshold`.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            The data to predict.

        Returns
        -------
        array-like of shape (n_samples,)
            The predicted values.
        """
        check_is_fitted(self, ["classes_", "estimator_"])
        predicate = self.estimator_.predict_proba(X)[:, 1] > self.threshold
        return np.where(predicate, self.classes_[1], self.classes_[0])

    def predict_proba(self, X):
        """Alias for `.predict_proba()` method of the underlying estimator."""
        check_is_fitted(self, ["classes_", "estimator_"])
        return self.estimator_.predict_proba(X)

    def score(self, X, y):
        """Alias for `.score()` method of the underlying estimator."""
        check_is_fitted(self, ["classes_", "estimator_"])
        return self.estimator_.score(X, y)

    def _more_tags(self):
        return {
            "binary_only": True,
        }

fit(X, y, sample_weight=None)

Fit the underlying estimator using X and y as training data. If refit=True we will always retrain (a copy of) the estimator.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features )

The training data.

required
y array-like of shape (n_samples,)

The target values.

required
sample_weight array-like of shape (n_samples, )

Individual weights for each sample.

None

Returns:

Name Type Description
self Thresholder

The fitted estimator.

Raises:

Type Description
ValueError
  • If model is not a classifier or it does not implement predict_proba method.
  • If model does not have two classes.
Source code in sklego/meta/thresholder.py
def fit(self, X, y, sample_weight=None):
    """Fit the underlying estimator using `X` and `y` as training data. If `refit=True` we will always retrain
    (a copy of) the estimator.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features )
        The training data.
    y : array-like of shape (n_samples,)
        The target values.
    sample_weight : array-like of shape (n_samples, ), default=None
        Individual weights for each sample.

    Returns
    -------
    self : Thresholder
        The fitted estimator.

    Raises
    ------
    ValueError
        - If `model` is not a classifier or it does not implement `predict_proba` method.
        - If `model` does not have two classes.
    """
    self.estimator_ = self.model
    if not isinstance(self.estimator_, ProbabilisticClassifier):
        raise ValueError("The Thresholder meta model only works on classification models with .predict_proba.")

    if self.check_input:
        X, y = check_X_y(X, y, force_all_finite=False, ensure_min_features=0, estimator=self)

    self._handle_refit(X, y, sample_weight)

    self.n_features_in_ = X.shape[1]
    self.classes_ = self.estimator_.classes_
    if len(self.classes_) != 2:
        raise ValueError("The `Thresholder` meta model only works on models with two classes.")

    return self

predict(X)

Predict target values for X using fitted estimator and the given threshold.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

The data to predict.

required

Returns:

Type Description
array-like of shape (n_samples,)

The predicted values.

Source code in sklego/meta/thresholder.py
def predict(self, X):
    """Predict target values for `X` using fitted estimator and the given `threshold`.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        The data to predict.

    Returns
    -------
    array-like of shape (n_samples,)
        The predicted values.
    """
    check_is_fitted(self, ["classes_", "estimator_"])
    predicate = self.estimator_.predict_proba(X)[:, 1] > self.threshold
    return np.where(predicate, self.classes_[1], self.classes_[0])

predict_proba(X)

Alias for .predict_proba() method of the underlying estimator.

Source code in sklego/meta/thresholder.py
def predict_proba(self, X):
    """Alias for `.predict_proba()` method of the underlying estimator."""
    check_is_fitted(self, ["classes_", "estimator_"])
    return self.estimator_.predict_proba(X)

score(X, y)

Alias for .score() method of the underlying estimator.

Source code in sklego/meta/thresholder.py
def score(self, X, y):
    """Alias for `.score()` method of the underlying estimator."""
    check_is_fitted(self, ["classes_", "estimator_"])
    return self.estimator_.score(X, y)

sklego.meta.zero_inflated_regressor.ZeroInflatedRegressor

Bases: BaseEstimator, RegressorMixin, MetaEstimatorMixin

A meta regressor for zero-inflated datasets, i.e. the targets contain a lot of zeroes.

ZeroInflatedRegressor consists of a classifier and a regressor.

  • The classifier's task is to find if the target is zero or not.
  • The regressor's task is to output a (usually positive) prediction whenever the classifier indicates that there should be a non-zero prediction.

The regressor is only trained on examples where the target is non-zero, which makes it easier for it to focus.

At prediction time, the classifier is first asked if the output should be zero. If yes, output zero. Otherwise, ask the regressor for its prediction and output it.

Parameters:

Name Type Description Default
classifier scikit-learn compatible classifier

A classifier that answers the question "Should the output be zero?".

required
regressor scikit-learn compatible regressor

A regressor for predicting the target. Its prediction is only used if classifier says that the output is non-zero.

required
handle_zero Literal[error, ignore]
How to behave in the case that all train set output consists of zero values only.

- `handle_zero = 'error'`: will raise a `ValueError` (default).
- `handle_zero = 'ignore'`: will continue to train the regressor on the entire dataset.
"error"

Attributes:

Name Type Description
classifier_ scikit-learn compatible classifier

The fitted classifier.

regressor_ scikit-learn compatible regressor

The fitted regressor.

Examples:

import numpy as np
from sklearn.ensemble import ExtraTreesClassifier, ExtraTreesRegressor
from sklego.meta import ZeroInflatedRegressor

np.random.seed(0)
X = np.random.randn(10000, 4)
y = ((X[:, 0]>0) & (X[:, 1]>0)) * np.abs(X[:, 2] * X[:, 3]**2)

model = ZeroInflatedRegressor(
    classifier=ExtraTreesClassifier(random_state=0, max_depth=10),
    regressor=ExtraTreesRegressor(random_state=0)
).fit(X, y)

model.predict(X[:5])
# array([4.91483294, 0.        , 0.        , 0.04941909, 0.        ])

model.score_samples(X[:5]).round(2)
# array([3.73, 0.  , 0.11, 0.03, 0.06])
Source code in sklego/meta/zero_inflated_regressor.py
class ZeroInflatedRegressor(BaseEstimator, RegressorMixin, MetaEstimatorMixin):
    """A meta regressor for zero-inflated datasets, i.e. the targets contain a lot of zeroes.

    `ZeroInflatedRegressor` consists of a classifier and a regressor.

    - The classifier's task is to find if the target is zero or not.
    - The regressor's task is to output a (usually positive) prediction whenever the classifier indicates that
    there should be a non-zero prediction.

    The regressor is only trained on examples where the target is non-zero, which makes it easier for it to focus.

    At prediction time, the classifier is first asked if the output should be zero. If yes, output zero.
    Otherwise, ask the regressor for its prediction and output it.

    Parameters
    ----------
    classifier : scikit-learn compatible classifier
        A classifier that answers the question "Should the output be zero?".
    regressor : scikit-learn compatible regressor
        A regressor for predicting the target. Its prediction is only used if `classifier` says that the output is
        non-zero.
    handle_zero : Literal["error", "ignore"], default="error"
            How to behave in the case that all train set output consists of zero values only.

            - `handle_zero = 'error'`: will raise a `ValueError` (default).
            - `handle_zero = 'ignore'`: will continue to train the regressor on the entire dataset.

    Attributes
    ----------
    classifier_ : scikit-learn compatible classifier
        The fitted classifier.
    regressor_ : scikit-learn compatible regressor
        The fitted regressor.

    Examples
    --------
    ```py
    import numpy as np
    from sklearn.ensemble import ExtraTreesClassifier, ExtraTreesRegressor
    from sklego.meta import ZeroInflatedRegressor

    np.random.seed(0)
    X = np.random.randn(10000, 4)
    y = ((X[:, 0]>0) & (X[:, 1]>0)) * np.abs(X[:, 2] * X[:, 3]**2)

    model = ZeroInflatedRegressor(
        classifier=ExtraTreesClassifier(random_state=0, max_depth=10),
        regressor=ExtraTreesRegressor(random_state=0)
    ).fit(X, y)

    model.predict(X[:5])
    # array([4.91483294, 0.        , 0.        , 0.04941909, 0.        ])

    model.score_samples(X[:5]).round(2)
    # array([3.73, 0.  , 0.11, 0.03, 0.06])
    ```
    """

    _required_parameters = ["classifier", "regressor"]

    def __init__(self, classifier, regressor, handle_zero="error") -> None:
        self.classifier = classifier
        self.regressor = regressor
        self.handle_zero = handle_zero

    def fit(self, X, y, sample_weight=None):
        """Fit the underlying classifier and regressor using `X` and `y` as training data. The regressor is only trained
        on examples where the target is non-zero.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features )
            The training data.
        y : array-like of shape (n_samples,)
            The target values.
        sample_weight : array-like of shape (n_samples, ), default=None
            Individual weights for each sample.

        Returns
        -------
        self : ZeroInflatedRegressor
            The fitted estimator.

        Raises
        ------
        ValueError
            If `classifier` is not a classifier
            If `regressor` is not a regressor
            If all train target entirely consists of zeros and `handle_zero="error"`
        """
        X, y = check_X_y(X, y)
        self._check_n_features(X, reset=True)
        if not is_classifier(self.classifier):
            raise ValueError(
                f"`classifier` has to be a classifier. Received instance of {type(self.classifier)} instead."
            )
        if not is_regressor(self.regressor):
            raise ValueError(f"`regressor` has to be a regressor. Received instance of {type(self.regressor)} instead.")
        if self.handle_zero not in {"ignore", "error"}:
            raise ValueError(
                f"`handle_zero` has to be one of {'ignore', 'error'}. Received '{self.handle_zero}' instead."
            )

        sample_weight = _check_sample_weight(sample_weight, X)
        try:
            check_is_fitted(self.classifier)
            self.classifier_ = self.classifier
        except NotFittedError:
            self.classifier_ = clone(self.classifier)

            if "sample_weight" in signature(self.classifier_.fit).parameters:
                self.classifier_.fit(X, y != 0, sample_weight=sample_weight)
            else:
                logging.warning("Classifier ignores sample_weight.")
                self.classifier_.fit(X, y != 0)

        indices_for_training = np.where(y != 0)[0]  # these are the non-zero indices
        if (self.handle_zero == "ignore") & (
            indices_for_training.size == 0
        ):  # if we choose to ignore that all train set output is 0
            logging.warning("Regressor will be training on `y` consisting of zero values only.")
            indices_for_training = np.where(y == 0)[0]  # use the whole train set

        if indices_for_training.size > 0:
            try:
                check_is_fitted(self.regressor)
                self.regressor_ = self.regressor
            except NotFittedError:
                self.regressor_ = clone(self.regressor)

                if "sample_weight" in signature(self.regressor_.fit).parameters:
                    self.regressor_.fit(
                        X[indices_for_training],
                        y[indices_for_training],
                        sample_weight=sample_weight[indices_for_training] if sample_weight is not None else None,
                    )
                else:
                    logging.warning("Regressor ignores sample_weight.")
                    self.regressor_.fit(
                        X[indices_for_training],
                        y[indices_for_training],
                    )
        else:
            raise ValueError(
                """The predicted training labels are all zero, making the regressor obsolete. Change the classifier
                or use a plain regressor instead. Alternatively, you can choose to ignore that predicted labels are
                all zero by setting flag handle_zero = 'ignore'"""
            )

        return self

    def predict(self, X):
        """Predict target values for `X` using fitted estimator by first asking the classifier if the output should be
        zero. If yes, output zero. Otherwise, ask the regressor for its prediction and output it.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            The data to predict.

        Returns
        -------
        array-like of shape (n_samples,)
            The predicted values.
        """
        check_is_fitted(self)
        X = check_array(X)
        self._check_n_features(X, reset=False)

        output = np.zeros(len(X))
        non_zero_indices = np.where(self.classifier_.predict(X))[0]

        if non_zero_indices.size > 0:
            output[non_zero_indices] = self.regressor_.predict(X[non_zero_indices])

        return output

    @available_if(lambda self: hasattr(self.classifier_, "predict_proba"))
    def score_samples(self, X):
        r"""Predict risk estimate of `X` as the probability of `X` to not be zero times the expected value of `X`:

        $$\text{score_sample(X)} = (1-P(X=0)) \cdot E[X]$$

        where:

        - $P(X=0)$ is calculated using the `.predict_proba()` method of the underlying classifier.
        - $E[X]$ is the regressor prediction on `X`.

        !!! info

            This method requires the underlying classifier to implement `.predict_proba()` method.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            The data to predict.

        Returns
        -------
        array-like of shape (n_samples,)
            The predicted risk.
        """

        check_is_fitted(self)
        X = check_array(X)
        self._check_n_features(X, reset=True)

        non_zero_proba = self.classifier_.predict_proba(X)[:, 1]
        expected_impact = self.regressor_.predict(X)

        return non_zero_proba * expected_impact

fit(X, y, sample_weight=None)

Fit the underlying classifier and regressor using X and y as training data. The regressor is only trained on examples where the target is non-zero.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features )

The training data.

required
y array-like of shape (n_samples,)

The target values.

required
sample_weight array-like of shape (n_samples, )

Individual weights for each sample.

None

Returns:

Name Type Description
self ZeroInflatedRegressor

The fitted estimator.

Raises:

Type Description
ValueError

If classifier is not a classifier If regressor is not a regressor If all train target entirely consists of zeros and handle_zero="error"

Source code in sklego/meta/zero_inflated_regressor.py
def fit(self, X, y, sample_weight=None):
    """Fit the underlying classifier and regressor using `X` and `y` as training data. The regressor is only trained
    on examples where the target is non-zero.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features )
        The training data.
    y : array-like of shape (n_samples,)
        The target values.
    sample_weight : array-like of shape (n_samples, ), default=None
        Individual weights for each sample.

    Returns
    -------
    self : ZeroInflatedRegressor
        The fitted estimator.

    Raises
    ------
    ValueError
        If `classifier` is not a classifier
        If `regressor` is not a regressor
        If all train target entirely consists of zeros and `handle_zero="error"`
    """
    X, y = check_X_y(X, y)
    self._check_n_features(X, reset=True)
    if not is_classifier(self.classifier):
        raise ValueError(
            f"`classifier` has to be a classifier. Received instance of {type(self.classifier)} instead."
        )
    if not is_regressor(self.regressor):
        raise ValueError(f"`regressor` has to be a regressor. Received instance of {type(self.regressor)} instead.")
    if self.handle_zero not in {"ignore", "error"}:
        raise ValueError(
            f"`handle_zero` has to be one of {'ignore', 'error'}. Received '{self.handle_zero}' instead."
        )

    sample_weight = _check_sample_weight(sample_weight, X)
    try:
        check_is_fitted(self.classifier)
        self.classifier_ = self.classifier
    except NotFittedError:
        self.classifier_ = clone(self.classifier)

        if "sample_weight" in signature(self.classifier_.fit).parameters:
            self.classifier_.fit(X, y != 0, sample_weight=sample_weight)
        else:
            logging.warning("Classifier ignores sample_weight.")
            self.classifier_.fit(X, y != 0)

    indices_for_training = np.where(y != 0)[0]  # these are the non-zero indices
    if (self.handle_zero == "ignore") & (
        indices_for_training.size == 0
    ):  # if we choose to ignore that all train set output is 0
        logging.warning("Regressor will be training on `y` consisting of zero values only.")
        indices_for_training = np.where(y == 0)[0]  # use the whole train set

    if indices_for_training.size > 0:
        try:
            check_is_fitted(self.regressor)
            self.regressor_ = self.regressor
        except NotFittedError:
            self.regressor_ = clone(self.regressor)

            if "sample_weight" in signature(self.regressor_.fit).parameters:
                self.regressor_.fit(
                    X[indices_for_training],
                    y[indices_for_training],
                    sample_weight=sample_weight[indices_for_training] if sample_weight is not None else None,
                )
            else:
                logging.warning("Regressor ignores sample_weight.")
                self.regressor_.fit(
                    X[indices_for_training],
                    y[indices_for_training],
                )
    else:
        raise ValueError(
            """The predicted training labels are all zero, making the regressor obsolete. Change the classifier
            or use a plain regressor instead. Alternatively, you can choose to ignore that predicted labels are
            all zero by setting flag handle_zero = 'ignore'"""
        )

    return self

predict(X)

Predict target values for X using fitted estimator by first asking the classifier if the output should be zero. If yes, output zero. Otherwise, ask the regressor for its prediction and output it.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

The data to predict.

required

Returns:

Type Description
array-like of shape (n_samples,)

The predicted values.

Source code in sklego/meta/zero_inflated_regressor.py
def predict(self, X):
    """Predict target values for `X` using fitted estimator by first asking the classifier if the output should be
    zero. If yes, output zero. Otherwise, ask the regressor for its prediction and output it.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        The data to predict.

    Returns
    -------
    array-like of shape (n_samples,)
        The predicted values.
    """
    check_is_fitted(self)
    X = check_array(X)
    self._check_n_features(X, reset=False)

    output = np.zeros(len(X))
    non_zero_indices = np.where(self.classifier_.predict(X))[0]

    if non_zero_indices.size > 0:
        output[non_zero_indices] = self.regressor_.predict(X[non_zero_indices])

    return output

score_samples(X)

Predict risk estimate of X as the probability of X to not be zero times the expected value of X:

\[\text{score_sample(X)} = (1-P(X=0)) \cdot E[X]\]

where:

  • \(P(X=0)\) is calculated using the .predict_proba() method of the underlying classifier.
  • \(E[X]\) is the regressor prediction on X.

Info

This method requires the underlying classifier to implement .predict_proba() method.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

The data to predict.

required

Returns:

Type Description
array-like of shape (n_samples,)

The predicted risk.

Source code in sklego/meta/zero_inflated_regressor.py
@available_if(lambda self: hasattr(self.classifier_, "predict_proba"))
def score_samples(self, X):
    r"""Predict risk estimate of `X` as the probability of `X` to not be zero times the expected value of `X`:

    $$\text{score_sample(X)} = (1-P(X=0)) \cdot E[X]$$

    where:

    - $P(X=0)$ is calculated using the `.predict_proba()` method of the underlying classifier.
    - $E[X]$ is the regressor prediction on `X`.

    !!! info

        This method requires the underlying classifier to implement `.predict_proba()` method.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        The data to predict.

    Returns
    -------
    array-like of shape (n_samples,)
        The predicted risk.
    """

    check_is_fitted(self)
    X = check_array(X)
    self._check_n_features(X, reset=True)

    non_zero_proba = self.classifier_.predict_proba(X)[:, 1]
    expected_impact = self.regressor_.predict(X)

    return non_zero_proba * expected_impact

sklego.meta.hierarchical_predictor.HierarchicalPredictor

Bases: ShrinkageMixin, MetaEstimatorMixin, BaseEstimator

HierarchicalPredictor is a meta-estimator that fits a separate estimator for each group in the input data in a hierarchical manner. This means that an estimator is fitted for each level of the group columns.

The only exception to that is when shrinkage=None and fallback_method="raise", in which case only one estimator per group value is fitted.

If shrinkage is not None, the predictions of the group-level models are combined using a shrinkage method. The shrinkage method can be one of the predefined methods "constant", "equal", "min_n_obs", "relative" or a custom shrinkage function.

Differences with GroupedPredictor

There are two main differences between HierarchicalPredictor and GroupedPredictor:

  1. The first difference is the fallback method: HierarchicalPredictor has a fallback method that can be set to "parent" or "raise". If set to "parent", the estimator will recursively fall back to the parent group in case the group value is not found during .predict().

    As a consequence of this:

    • groups order matters!
    • Potentially a combinatoric number of estimators are fitted, one for each unique combination of group values and each level.
  2. HierarchicalPredictor is meant to properly handle shrinkage in classification tasks. However this requires that the estimator has a .predict_proba() method.

Inheritance

This class is not meant to be used directly, but to be inherited by a specific hierarchical predictor, such as HierarchicalRegressor or HierarchicalClassifier, which properly implement the .predict() and predict-like methods for the specific task.

New in version 0.8.0

Parameters:

Name Type Description Default
estimator scikit-learn compatible estimator/pipeline

The base estimator to be used for each level.

required
groups int | str | List[int] | List[str]

The column(s) of the array/dataframe to select as a grouping parameter set.

required
shrinkage Literal[constant, equal, min_n_obs, relative] | Callable | None

How to perform shrinkage:

  • None: No shrinkage (default)
  • "constant": the augmented prediction for each level is the weighted average between its prediction and the augmented prediction for its parent.
  • "equal": each group is weighed equally.
  • "min_n_obs": use only the smallest group with a certain amount of observations.
  • "relative": weigh each group according to its size.
  • Callable: a function that takes a list of group lengths and returns an array of the same size with the weights for each group.
None
fallback_method Literal[parent, 'raise']

The fallback strategy to use if a group is not found at prediction time:

  • "parent": recursively fall back to the parent group in case the group value is not found during .predict(). It requires to fit a model on each level, including a global model.
  • "raise": raise a KeyError if the group value is not found during .predict().
"parent"
n_jobs int | None

The number of jobs to run in parallel. The same convention of joblib.Parallel holds:

  • n_jobs = None: interpreted as n_jobs=1.
  • n_jobs > 0: n_cpus=n_jobs are used.
  • n_jobs < 0: (n_cpus + 1 + n_jobs) are used.
None
check_X bool

Whether to validate X to be non-empty 2D array of finite values and attempt to cast X to float. If disabled, the model/pipeline is expected to handle e.g. missing, non-numeric, or non-finite values.

True
shrinkage_kwargs dict

Keyword arguments to the shrinkage function

None

Attributes:

Name Type Description
estimators_ dict[tuple[Any,...], scikit-learn compatible estimator/pipeline]

Fitted estimators for each level. The keys are the group values, and the values are the fitted estimators. The group values are tuples of the group columns, including the global column which has a fixed placeholder value of 1.

Let's say we have two group columns, col_1 and col_2. col_1 has values 'A' and 'B', and col_2 has values 'X', ... Then estimators_ dictionary will look something like this:

{
    # global estimator
    (1,): LinearRegression(),

    # estimator for `col_1 = 'A'`
    (1, 'A'): LinearRegression(),

    # estimator for `col_1 = 'B'`
    (1, 'B'): LinearRegression(),

    # estimator for `col_1 = 'A'`, `col_2 = 'X'`
    (1, 'A', 'X'): LinearRegression(),
    ...
}
shrinkage_function_ callable

The shrinkage function that is used to calculate the shrinkage factors

shrinkage_factors_ dict[tuple[Any, ...], ndarray]

Shrinkage factors applied to each level.

The keys are the group values, and the values are the shrinkage factors. The group values are tuples of the group columns, including the global column which has a fixed placeholder value of 1.

groups_ list

List of all group columns including a global column.

n_groups_ int

Number of unique groups.

n_features_in_ int

Number of features in the training data.

n_features_ int

Number of features used by the estimators.

n_levels_ int

Number of hierarchical levels in the grouping.

Source code in sklego/meta/hierarchical_predictor.py
class HierarchicalPredictor(ShrinkageMixin, MetaEstimatorMixin, BaseEstimator):
    """`HierarchicalPredictor` is a meta-estimator that fits a separate estimator for each group in the input data
    in a hierarchical manner. This means that an estimator is fitted for each level of the group columns.

    The only exception to that is when `shrinkage=None` **and** `fallback_method="raise"`, in which case only
    one estimator per group value is fitted.

    If `shrinkage` is not `None`, the predictions of the group-level models are combined using a shrinkage method. The
    shrinkage method can be one of the predefined methods `"constant"`, `"equal"`, `"min_n_obs"`, `"relative"` or a
    custom shrinkage function.

    !!! question "Differences with `GroupedPredictor`"

        There are two main differences between `HierarchicalPredictor` and
        [`GroupedPredictor`][sklego.meta.grouped_predictor.GroupedPredictor]:

        1. The first difference is the fallback method: `HierarchicalPredictor` has a fallback method that can be set to
            `"parent"` or `"raise"`. If set to `"parent"`, the estimator will recursively fall back to the parent group
            in case the group value is not found during `.predict()`.

            As a consequence of this:

            - **`groups` order matters!**
            - Potentially a combinatoric number of estimators are fitted, one for each unique combination of group
                values and each level.

        2. `HierarchicalPredictor` is meant to properly handle shrinkage in classification tasks. However this
            **requires** that the estimator has a `.predict_proba()` method.

    !!! warning "Inheritance"

        This class is not meant to be used directly, but to be inherited by a specific hierarchical predictor, such as
        `HierarchicalRegressor` or `HierarchicalClassifier`, which properly implement the `.predict()` and
        `predict`-like methods for the specific task.

    !!! info "New in version 0.8.0"

    Parameters
    ----------
    estimator : scikit-learn compatible estimator/pipeline
        The base estimator to be used for each level.
    groups : int | str | List[int] | List[str]
        The column(s) of the array/dataframe to select as a grouping parameter set.
    shrinkage : Literal["constant", "equal", "min_n_obs", "relative"] | Callable | None, default=None
        How to perform shrinkage:

        - `None`: No shrinkage (default)
        - `"constant"`: the augmented prediction for each level is the weighted average between its prediction and the
            augmented prediction for its parent.
        - `"equal"`: each group is weighed equally.
        - `"min_n_obs"`: use only the smallest group with a certain amount of observations.
        - `"relative"`: weigh each group according to its size.
        - `Callable`: a function that takes a list of group lengths and returns an array of the same size with the
            weights for each group.
    fallback_method : Literal["parent", "raise"], default="parent"
        The fallback strategy to use if a group is not found at prediction time:

        - "parent": recursively fall back to the parent group in case the group value is not found during `.predict()`.
            It requires to fit a model on each level, including a global model.
        - "raise": raise a KeyError if the group value is not found during `.predict()`.
    n_jobs : int | None, default=None
        The number of jobs to run in parallel. The same convention of [`joblib.Parallel`](https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html)
        holds:

        - `n_jobs = None`: interpreted as n_jobs=1.
        - `n_jobs > 0`: n_cpus=n_jobs are used.
        - `n_jobs < 0`: (n_cpus + 1 + n_jobs) are used.
    check_X : bool, default=True
        Whether to validate `X` to be non-empty 2D array of finite values and attempt to cast `X` to float.
        If disabled, the model/pipeline is expected to handle e.g. missing, non-numeric, or non-finite values.
    shrinkage_kwargs : dict
        Keyword arguments to the shrinkage function

    Attributes
    ----------
    estimators_ : dict[tuple[Any,...], scikit-learn compatible estimator/pipeline]
        Fitted estimators for each level. The keys are the group values, and the values are the fitted estimators.
        The group values are tuples of the group columns, including the global column which has a fixed placeholder
        value of 1.

        Let's say we have two group columns, `col_1` and `col_2`. `col_1` has values 'A' and 'B', and `col_2` has
        values 'X', ... Then `estimators_` dictionary will look something like this:

        ```py
        {
            # global estimator
            (1,): LinearRegression(),

            # estimator for `col_1 = 'A'`
            (1, 'A'): LinearRegression(),

            # estimator for `col_1 = 'B'`
            (1, 'B'): LinearRegression(),

            # estimator for `col_1 = 'A'`, `col_2 = 'X'`
            (1, 'A', 'X'): LinearRegression(),
            ...
        }
        ```
    shrinkage_function_ : callable
        The shrinkage function that is used to calculate the shrinkage factors
    shrinkage_factors_ : dict[tuple[Any,...], np.ndarray]
        Shrinkage factors applied to each level.

        The keys are the group values, and the values are the shrinkage factors. The group values are tuples of the
        group columns, including the global column which has a fixed placeholder value of 1.
    groups_ : list
        List of all group columns including a global column.
    n_groups_ : int
        Number of unique groups.
    n_features_in_ : int
        Number of features in the training data.
    n_features_ : int
        Number of features used by the estimators.
    n_levels_ : int
        Number of hierarchical levels in the grouping.
    """

    _CHECK_KWARGS = {
        "ensure_min_features": 0,
        "accept_large_sparse": False,
    }
    _ALLOWED_SHRINKAGE = {
        "constant": constant_shrinkage,
        "relative": relative_shrinkage,
        "min_n_obs": min_n_obs_shrinkage,
        "equal": equal_shrinkage,
    }
    _ALLOWED_FALLBACK = {"parent", "raise"}

    _GLOBAL_NAME = "__sklego_global_estimator__"
    _TARGET_NAME = "__sklego_target_value__"
    _INDEX_NAME = "__sklego_index__"

    _required_parameters = ["estimator", "groups"]

    def __init__(
        self,
        estimator,
        groups,
        *,
        shrinkage=None,
        fallback_method="parent",
        n_jobs=None,
        check_X=True,
        shrinkage_kwargs=None,
    ):
        self.estimator = estimator
        self.groups = groups
        self.shrinkage = shrinkage
        self.fallback_method = fallback_method
        self.n_jobs = n_jobs
        self.check_X = check_X
        self.shrinkage_kwargs = shrinkage_kwargs

    @property
    def _estimator_type(self):
        """Computes `_estimator_type` dynamically from the wrapped model."""
        return self.estimator._estimator_type

    def fit(self, X, y=None):
        """Fit one estimator for each hierarchical group of training data `X` and `y`.

        Will also learn the groups that exist within the training dataset.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data.
        y : array-like of shape (n_samples,), default=None
            Target values, if applicable.

        Returns
        -------
        self : BaseHierarchicalEstimator
            The fitted estimator.

        Raises
        -------
        ValueError
            - If `check_X` is not a boolean.
            - If group columns contain NaN values.
            - If `shrinkage` is not one of `None`, `"constant"`, `"min_n_obs"`, `"relative"`, or a callable.
            - If `fallback_method` is not `"parent"` or `"raise"`.
        """

        if self.fallback_method not in self._ALLOWED_FALLBACK:
            raise ValueError(f"`fallback_method` should be either `parent` or `raise`. Found {self.fallback_method}")

        if not isinstance(self.check_X, bool):
            raise ValueError(f"`check_X` should be a boolean. Found {type(self.check_X)}")

        self.groups_ = [self._GLOBAL_NAME, *as_list(self.groups)]

        # The only case in which we don't have to fit multiple levels is when shrinkage is None and fallback_method is 'raise'
        self.fitted_levels_ = expanding_list(self.groups_)
        self.n_fitted_levels_ = len(self.fitted_levels_)
        # If invalid shrinkage, will raise ValueError (before fitting all the estimators!)
        self.shrinkage_function_ = self._set_shrinkage_function()

        _data_format_checks(X)

        X = nw.from_native(X, strict=False, eager_only=True)
        if not isinstance(X, nw.DataFrame):
            X = nw.from_native(pd.DataFrame(X))

        n_samples, self.n_features_in_ = X.shape

        if n_samples < 2:
            msg = f"Found {n_samples} sample or less, while a minimum of 2 is required."
            raise ValueError(msg)

        if self.n_features_in_ < 1:
            msg = "Found 0 features, while a minimum of 1 if required."
            raise ValueError(msg)

        native_namespace = nw.get_native_namespace(X)
        target_series = nw.from_dict({self._TARGET_NAME: y}, native_namespace=native_namespace)[self._TARGET_NAME]
        global_series = nw.from_dict({self._GLOBAL_NAME: np.ones(n_samples)}, native_namespace=native_namespace)[
            self._GLOBAL_NAME
        ]
        frame = X.with_columns(
            **{
                self._TARGET_NAME: target_series,
                self._GLOBAL_NAME: global_series,
            }
        ).pipe(self.__validate_frame)

        self.n_groups_ = len(self.groups_)
        self.n_features_ = frame.shape[1] - self.n_groups_ - 1

        self.estimators_ = self._fit_estimators(frame)
        self.shrinkage_factors_ = self._fit_shrinkage_factors(frame, groups=self.groups_)

        return self

    def predict(self, X):
        """Predict the target value for each sample in `X`."""
        raise NotImplementedError("This method should be implemented in the child class")

    def _predict_estimators(self, X, method_name):
        """Calls `method_name` on each level and apply shrinkage if necessary"""

        check_is_fitted(self, ["estimators_", "groups_"])

        if len(X.shape) != 2:
            raise ValueError(f"Reshape your data: X should be 2d, got {len(X.shape)}")

        if X.shape[1] != self.n_features_in_:
            raise ValueError(f"X should have {self.n_features_in_} features, got {X.shape[1]}")

        X = nw.from_native(X, strict=False, eager_only=True)
        if not isinstance(X, nw.DataFrame):
            X = nw.from_native(pd.DataFrame(X))

        n_samples = X.shape[0]
        native_namespace = nw.get_native_namespace(X)
        global_series = nw.from_dict({self._GLOBAL_NAME: np.ones(n_samples)}, native_namespace=native_namespace)[
            self._GLOBAL_NAME
        ]

        frame = X.with_columns(
            **{
                self._GLOBAL_NAME: global_series,
                self._INDEX_NAME: np.arange(n_samples),
            }
        ).pipe(self.__validate_frame)

        if not is_classifier(self.estimator):  # regressor or outlier detector
            n_out = 1
        else:
            if self.n_classes_ > 2 or method_name == "predict_proba":
                n_out = self.n_classes_
            else:  # binary case with `method_name = "decision_function"`
                n_out = 1

        preds = np.zeros((X.shape[0], self.n_levels_, n_out), dtype=float)
        shrinkage = np.zeros((X.shape[0], self.n_levels_), dtype=float)

        for level_idx, grp_names in enumerate(self.fitted_levels_):
            for grp_values, grp_frame in frame.group_by(grp_names):
                grp_idx = grp_frame.select(self._INDEX_NAME).to_numpy().reshape(-1)

                _estimator, _level = _get_estimator(
                    estimators=self.estimators_,
                    grp_values=grp_values,
                    grp_names=grp_names,
                    return_level=len(grp_names),
                    fallback_method=self.fallback_method,
                )
                _shrinkage_factor = self.shrinkage_factors_[grp_values[:_level]]

                last_dim_ix = _estimator.classes_ if is_classifier(self.estimator) else [0]
                X_grp_ = nw.to_native(grp_frame.drop([*self.groups_, self._INDEX_NAME]))
                raw_pred = getattr(_estimator, method_name)(X_grp_)

                preds[np.ix_(grp_idx, [level_idx], last_dim_ix)] = np.atleast_3d(raw_pred[:, None])
                shrinkage[np.ix_(grp_idx)] = np.pad(
                    _shrinkage_factor, (0, self.n_levels_ - len(_shrinkage_factor)), "constant", constant_values=(0)
                )

        return (preds * np.atleast_3d(shrinkage)).sum(axis=1).squeeze()

    def _fit_single_estimator(self, grp_frame):
        """Shortcut to fit an estimator on a single group"""
        _X = nw.to_native(grp_frame.drop([*self.groups_, self._TARGET_NAME]))
        _y = nw.to_native(grp_frame[self._TARGET_NAME])

        return clone(self.estimator).fit(_X, _y)

    def _fit_estimators(self, frame: nw.DataFrame):
        """Fits one estimator per level of the group column(s), and returns a dictionary of the fitted estimators.

        The keys of the dictionary are the group values, and the values are the fitted estimators.
        The if-else block is used to parallelize the fitting process if `n_jobs` is greater than 1.
        """
        # Question: Should the `estimators_` keys be named tuples instead of plain tuples?
        if self.n_jobs is None or self.n_jobs == 1:
            estimators_ = {
                grp_values: self._fit_single_estimator(grp_frame)
                for grp_names in self.fitted_levels_
                for grp_values, grp_frame in frame.group_by(grp_names)
            }
        else:
            fit_func = lambda grp_values, grp_frame: (grp_values, self._fit_single_estimator(grp_frame))

            estimators_ = dict(
                Parallel(n_jobs=self.n_jobs)(
                    delayed(fit_func)(grp_values, grp_frame)
                    for grp_names in self.fitted_levels_
                    for grp_values, grp_frame in frame.group_by(grp_names)
                )
            )

        return estimators_

    def __validate_frame(self, frame):
        """Validate the input arrays"""

        if self.check_X:
            X_values = frame.drop([*self.groups_])
            check_array(X_values, **self._CHECK_KWARGS)

        _validate_groups_values(frame, self.groups_)

        return frame

    @property
    def n_levels_(self):
        warn(
            "Please use `n_fitted_levels_` instead of `n_levels_`, `n_levels_` will be deprecated in future versions",
            DeprecationWarning,
        )
        return self.n_fitted_levels_

    def _more_tags(self):
        return {"allow_nan": True}

fit(X, y=None)

Fit one estimator for each hierarchical group of training data X and y.

Will also learn the groups that exist within the training dataset.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Training data.

required
y array-like of shape (n_samples,)

Target values, if applicable.

None

Returns:

Name Type Description
self BaseHierarchicalEstimator

The fitted estimator.

Raises:

Type Description
ValueError
  • If check_X is not a boolean.
  • If group columns contain NaN values.
  • If shrinkage is not one of None, "constant", "min_n_obs", "relative", or a callable.
  • If fallback_method is not "parent" or "raise".
Source code in sklego/meta/hierarchical_predictor.py
def fit(self, X, y=None):
    """Fit one estimator for each hierarchical group of training data `X` and `y`.

    Will also learn the groups that exist within the training dataset.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Training data.
    y : array-like of shape (n_samples,), default=None
        Target values, if applicable.

    Returns
    -------
    self : BaseHierarchicalEstimator
        The fitted estimator.

    Raises
    -------
    ValueError
        - If `check_X` is not a boolean.
        - If group columns contain NaN values.
        - If `shrinkage` is not one of `None`, `"constant"`, `"min_n_obs"`, `"relative"`, or a callable.
        - If `fallback_method` is not `"parent"` or `"raise"`.
    """

    if self.fallback_method not in self._ALLOWED_FALLBACK:
        raise ValueError(f"`fallback_method` should be either `parent` or `raise`. Found {self.fallback_method}")

    if not isinstance(self.check_X, bool):
        raise ValueError(f"`check_X` should be a boolean. Found {type(self.check_X)}")

    self.groups_ = [self._GLOBAL_NAME, *as_list(self.groups)]

    # The only case in which we don't have to fit multiple levels is when shrinkage is None and fallback_method is 'raise'
    self.fitted_levels_ = expanding_list(self.groups_)
    self.n_fitted_levels_ = len(self.fitted_levels_)
    # If invalid shrinkage, will raise ValueError (before fitting all the estimators!)
    self.shrinkage_function_ = self._set_shrinkage_function()

    _data_format_checks(X)

    X = nw.from_native(X, strict=False, eager_only=True)
    if not isinstance(X, nw.DataFrame):
        X = nw.from_native(pd.DataFrame(X))

    n_samples, self.n_features_in_ = X.shape

    if n_samples < 2:
        msg = f"Found {n_samples} sample or less, while a minimum of 2 is required."
        raise ValueError(msg)

    if self.n_features_in_ < 1:
        msg = "Found 0 features, while a minimum of 1 if required."
        raise ValueError(msg)

    native_namespace = nw.get_native_namespace(X)
    target_series = nw.from_dict({self._TARGET_NAME: y}, native_namespace=native_namespace)[self._TARGET_NAME]
    global_series = nw.from_dict({self._GLOBAL_NAME: np.ones(n_samples)}, native_namespace=native_namespace)[
        self._GLOBAL_NAME
    ]
    frame = X.with_columns(
        **{
            self._TARGET_NAME: target_series,
            self._GLOBAL_NAME: global_series,
        }
    ).pipe(self.__validate_frame)

    self.n_groups_ = len(self.groups_)
    self.n_features_ = frame.shape[1] - self.n_groups_ - 1

    self.estimators_ = self._fit_estimators(frame)
    self.shrinkage_factors_ = self._fit_shrinkage_factors(frame, groups=self.groups_)

    return self

predict(X)

Predict the target value for each sample in X.

Source code in sklego/meta/hierarchical_predictor.py
def predict(self, X):
    """Predict the target value for each sample in `X`."""
    raise NotImplementedError("This method should be implemented in the child class")

sklego.meta.hierarchical_predictor.HierarchicalClassifier

Bases: HierarchicalPredictor, ClassifierMixin

A hierarchical classifier that predicts labels using hierarchical grouping.

This class extends HierarchicalPredictor and adds functionality specific to regression tasks.

Its spec is the same as HierarchicalPredictor, with additional checks to ensure that the supplied estimator is a classifier that implements the .predict_proba() method.

.predict_proba(..) method required!

In order to use shrinkage with classification tasks, we require the estimator to have .predict_proba() method. The only way to blend the predictions of the group-level models is by using the probabilities of each class, and not the labels themselves.

New in version 0.8.0

Examples:

import pandas as pd

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

from sklego.meta import HierarchicalClassifier

X, y = make_classification(n_samples=1000, n_features=10, n_informative=3, random_state=42)
X = pd.DataFrame(X, columns=[f"x_{i}" for i in range(X.shape[1])]).assign(
    g_1 = ['A'] * 500 + ['B'] * 500,
    g_2 = ['X'] * 250 + ['Y'] * 250 + ['Z'] * 250 + ['W'] * 250
)
groups = ["g_1", "g_2"]

hc = HierarchicalClassifier(
    estimator=LogisticRegression(),
    groups=groups
).fit(X, y)

hc.estimators_
{
    (1,): LogisticRegression(),  # global estimator
    (1, 'A'): LogisticRegression(),  # estimator for `g_1 = 'A'`
    (1, 'B'): LogisticRegression(),  # estimator for `g_1 = 'B'`
    (1, 'A', 'X'): LogisticRegression(),  # estimator for `(g_1, g_2) = ('A', 'X`)`
    (1, 'A', 'Y'): LogisticRegression(),  # estimator for `(g_1, g_2) = ('A', 'Y`)`
    (1, 'B', 'W'): LogisticRegression(),  # estimator for `(g_1, g_2) = ('B', 'W`)`
    (1, 'B', 'Z'): LogisticRegression(),  # estimator for `(g_1, g_2) = ('B', 'Z`)`
}

As we can see, the estimators are fitted for each level of the group columns. The trailing (1,) is the global estimator, which is fitted on the entire dataset.

If we try to predict a sample in which (g_1, g_2) = ('B', 'X'), this will fallback to the estimator (1, 'B'). when fallback_method="parent" or will raise a KeyError when fallback_method="raise".

As one would expect, estimator can be a pipeline, and the pipeline will be fitted on each level of the group:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

hc = HierarchicalClassifier(
    estimator=Pipeline([
        ('scaler', StandardScaler()),
        ('model', LogisticRegression())
        ]),
    groups=groups
).fit(X, y)

Source code in sklego/meta/hierarchical_predictor.py
class HierarchicalClassifier(HierarchicalPredictor, ClassifierMixin):
    """A hierarchical classifier that predicts labels using hierarchical grouping.

    This class extends [`HierarchicalPredictor`][sklego.meta.hierarchical_predictor.HierarchicalPredictor] and adds
    functionality specific to regression tasks.

    Its spec is the same as `HierarchicalPredictor`, with additional checks to ensure that the supplied estimator is a
    classifier that implements the `.predict_proba()` method.

    !!! warning ".`predict_proba(..)` method required!"

        In order to use shrinkage with classification tasks, we require the estimator to have `.predict_proba()` method.
        The only way to blend the predictions of the group-level models is by using the probabilities of each class,
        and not the labels themselves.

    !!! info "New in version 0.8.0"

    Examples
    --------
    ```py
    import pandas as pd

    from sklearn.datasets import make_classification
    from sklearn.linear_model import LogisticRegression

    from sklego.meta import HierarchicalClassifier

    X, y = make_classification(n_samples=1000, n_features=10, n_informative=3, random_state=42)
    X = pd.DataFrame(X, columns=[f"x_{i}" for i in range(X.shape[1])]).assign(
        g_1 = ['A'] * 500 + ['B'] * 500,
        g_2 = ['X'] * 250 + ['Y'] * 250 + ['Z'] * 250 + ['W'] * 250
    )
    groups = ["g_1", "g_2"]

    hc = HierarchicalClassifier(
        estimator=LogisticRegression(),
        groups=groups
    ).fit(X, y)

    hc.estimators_
    ```

    ```terminal
    {
        (1,): LogisticRegression(),  # global estimator
        (1, 'A'): LogisticRegression(),  # estimator for `g_1 = 'A'`
        (1, 'B'): LogisticRegression(),  # estimator for `g_1 = 'B'`
        (1, 'A', 'X'): LogisticRegression(),  # estimator for `(g_1, g_2) = ('A', 'X`)`
        (1, 'A', 'Y'): LogisticRegression(),  # estimator for `(g_1, g_2) = ('A', 'Y`)`
        (1, 'B', 'W'): LogisticRegression(),  # estimator for `(g_1, g_2) = ('B', 'W`)`
        (1, 'B', 'Z'): LogisticRegression(),  # estimator for `(g_1, g_2) = ('B', 'Z`)`
    }
    ```

    As we can see, the estimators are fitted for each level of the group columns. The trailing (1,) is the global
    estimator, which is fitted on the entire dataset.

    If we try to predict a sample in which `(g_1, g_2) = ('B', 'X')`, this will fallback to the estimator `(1, 'B')`.
    when `fallback_method="parent"` or will raise a KeyError when `fallback_method="raise"`.

    As one would expect, `estimator` can be a pipeline, and the pipeline will be fitted on each level of the group:
    ```py
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import StandardScaler

    hc = HierarchicalClassifier(
        estimator=Pipeline([
            ('scaler', StandardScaler()),
            ('model', LogisticRegression())
            ]),
        groups=groups
    ).fit(X, y)
    ```
    """

    def fit(self, X, y):
        """Fit one classifier for each hierarchical group of training data `X` and `y`.

        Will also learn the groups that exist within the training dataset, the classes and the number of classes in the
        target values.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data.
        y : array-like of shape (n_samples,)
            Target values.

        Returns
        -------
        self : HierarchicalClassifier
            The fitted classifier.

        Raises
        -------
        ValueError
            If the supplied estimator is not a classifier.
        """
        if not is_classifier(self.estimator):
            raise ValueError("The supplied estimator should be a classifier")

        if not hasattr(self.estimator, "predict_proba"):
            raise ValueError("The supplied estimator should have a 'predict_proba' method")

        self.classes_ = np.unique(y)
        self.n_classes_ = len(self.classes_)

        super().fit(X, y)
        return self

    def predict(self, X):
        """Predict class labels for samples in `X` as the class with the highest probability.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            The data to predict.

        Returns
        -------
        array-like of shape (n_samples,)
            The predicted class labels.
        """

        preds = self._predict_estimators(X, method_name="predict_proba")
        return self.classes_[np.argmax(preds, axis=1)]

    def predict_proba(self, X):
        """Predict probabilities for each class on new data `X`.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            The data to predict.

        Returns
        -------
        array-like of shape (n_samples, n_classes)
            Predicted probabilities per class.
        """
        return self._predict_estimators(X, method_name="predict_proba")

    @available_if(lambda self: hasattr(self.estimator, "decision_function"))
    def decision_function(self, X):
        """Predict confidence scores for samples in `X`.

        !!! warning
            Available only if the underlying estimator implements `.decision_function()` method.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            The data to predict.

        Returns
        -------
        array-like of shape (n_samples,) or (n_samples, n_classes)
            Confidence scores per (n_samples, n_classes) combination.
            In the binary case, confidence score for self.classes_[1] where > 0 means this class would be
            predicted.
        """
        warn(
            "`decision_function` will lead to inconsistent results in cases where the estimators are not all fitted "
            "on the same target values.",
            UserWarning,
        )
        return self._predict_estimators(X, method_name="decision_function")

decision_function(X)

Predict confidence scores for samples in X.

Warning

Available only if the underlying estimator implements .decision_function() method.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

The data to predict.

required

Returns:

Type Description
array-like of shape (n_samples,) or (n_samples, n_classes)

Confidence scores per (n_samples, n_classes) combination. In the binary case, confidence score for self.classes_[1] where > 0 means this class would be predicted.

Source code in sklego/meta/hierarchical_predictor.py
@available_if(lambda self: hasattr(self.estimator, "decision_function"))
def decision_function(self, X):
    """Predict confidence scores for samples in `X`.

    !!! warning
        Available only if the underlying estimator implements `.decision_function()` method.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        The data to predict.

    Returns
    -------
    array-like of shape (n_samples,) or (n_samples, n_classes)
        Confidence scores per (n_samples, n_classes) combination.
        In the binary case, confidence score for self.classes_[1] where > 0 means this class would be
        predicted.
    """
    warn(
        "`decision_function` will lead to inconsistent results in cases where the estimators are not all fitted "
        "on the same target values.",
        UserWarning,
    )
    return self._predict_estimators(X, method_name="decision_function")

fit(X, y)

Fit one classifier for each hierarchical group of training data X and y.

Will also learn the groups that exist within the training dataset, the classes and the number of classes in the target values.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Training data.

required
y array-like of shape (n_samples,)

Target values.

required

Returns:

Name Type Description
self HierarchicalClassifier

The fitted classifier.

Raises:

Type Description
ValueError

If the supplied estimator is not a classifier.

Source code in sklego/meta/hierarchical_predictor.py
def fit(self, X, y):
    """Fit one classifier for each hierarchical group of training data `X` and `y`.

    Will also learn the groups that exist within the training dataset, the classes and the number of classes in the
    target values.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Training data.
    y : array-like of shape (n_samples,)
        Target values.

    Returns
    -------
    self : HierarchicalClassifier
        The fitted classifier.

    Raises
    -------
    ValueError
        If the supplied estimator is not a classifier.
    """
    if not is_classifier(self.estimator):
        raise ValueError("The supplied estimator should be a classifier")

    if not hasattr(self.estimator, "predict_proba"):
        raise ValueError("The supplied estimator should have a 'predict_proba' method")

    self.classes_ = np.unique(y)
    self.n_classes_ = len(self.classes_)

    super().fit(X, y)
    return self

predict(X)

Predict class labels for samples in X as the class with the highest probability.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

The data to predict.

required

Returns:

Type Description
array-like of shape (n_samples,)

The predicted class labels.

Source code in sklego/meta/hierarchical_predictor.py
def predict(self, X):
    """Predict class labels for samples in `X` as the class with the highest probability.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        The data to predict.

    Returns
    -------
    array-like of shape (n_samples,)
        The predicted class labels.
    """

    preds = self._predict_estimators(X, method_name="predict_proba")
    return self.classes_[np.argmax(preds, axis=1)]

predict_proba(X)

Predict probabilities for each class on new data X.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

The data to predict.

required

Returns:

Type Description
array-like of shape (n_samples, n_classes)

Predicted probabilities per class.

Source code in sklego/meta/hierarchical_predictor.py
def predict_proba(self, X):
    """Predict probabilities for each class on new data `X`.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        The data to predict.

    Returns
    -------
    array-like of shape (n_samples, n_classes)
        Predicted probabilities per class.
    """
    return self._predict_estimators(X, method_name="predict_proba")

sklego.meta.hierarchical_predictor.HierarchicalRegressor

Bases: HierarchicalPredictor, RegressorMixin

A hierarchical regressor that predicts values using hierarchical grouping.

This class extends HierarchicalPredictor and adds functionality specific to regression tasks.

Its spec is the same as HierarchicalPredictor, with additional checks to ensure that the supplied estimator is a regressor.

New in version 0.8.0

Examples:

import pandas as pd

from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression

from sklego.meta import HierarchicalRegressor

X, y = make_regression(n_samples=1000, n_features=10, n_informative=3, random_state=42)
X = pd.DataFrame(X, columns=[f"x_{i}" for i in range(X.shape[1])]).assign(
    g_1 = ['A'] * 500 + ['B'] * 500,
    g_2 = ['X'] * 250 + ['Y'] * 250 + ['Z'] * 250 + ['W'] * 250
)
groups = ["g_1", "g_2"]

hr = HierarchicalRegressor(
    estimator=LinearRegression(),
    groups=groups
).fit(X, y)

hr.estimators_
{
    (1,): LinearRegression(),  # global estimator
    (1, 'A'): LinearRegression(),  # estimator for `g_1 = 'A'`
    (1, 'B'): LinearRegression(),  # estimator for `g_1 = 'B'`
    (1, 'A', 'X'): LinearRegression(),  # estimator for `(g_1, g_2) = ('A', 'X`)`
    (1, 'A', 'Y'): LinearRegression(),  # estimator for `(g_1, g_2) = ('A', 'Y`)`
    (1, 'B', 'W'): LinearRegression(),  # estimator for `(g_1, g_2) = ('B', 'W`)`
    (1, 'B', 'Z'): LinearRegression(),  # estimator for `(g_1, g_2) = ('B', 'Z`)`
}

As we can see, the estimators are fitted for each level of the group columns. The trailing (1,) is the global estimator, which is fitted on the entire dataset.

If we try to predict a sample in which (g_1, g_2) = ('B', 'X'), this will fallback to the estimator (1, 'B'). when fallback_method="parent" or will raise a KeyError when fallback_method="raise".

As one would expect, estimator can be a pipeline, and the pipeline will be fitted on each level of the group:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

hr = HierarchicalRegressor(
    estimator=Pipeline([
        ('scaler', StandardScaler()),
        ('model', LinearRegression())
        ]),
    groups=groups
).fit(X, y)

Source code in sklego/meta/hierarchical_predictor.py
class HierarchicalRegressor(HierarchicalPredictor, RegressorMixin):
    """A hierarchical regressor that predicts values using hierarchical grouping.

    This class extends [`HierarchicalPredictor`][sklego.meta.hierarchical_predictor.HierarchicalPredictor] and adds
    functionality specific to regression tasks.

    Its spec is the same as `HierarchicalPredictor`, with additional checks to ensure that the supplied estimator is a
    regressor.

    !!! info "New in version 0.8.0"

    Examples
    --------
    ```py
    import pandas as pd

    from sklearn.datasets import make_regression
    from sklearn.linear_model import LinearRegression

    from sklego.meta import HierarchicalRegressor

    X, y = make_regression(n_samples=1000, n_features=10, n_informative=3, random_state=42)
    X = pd.DataFrame(X, columns=[f"x_{i}" for i in range(X.shape[1])]).assign(
        g_1 = ['A'] * 500 + ['B'] * 500,
        g_2 = ['X'] * 250 + ['Y'] * 250 + ['Z'] * 250 + ['W'] * 250
    )
    groups = ["g_1", "g_2"]

    hr = HierarchicalRegressor(
        estimator=LinearRegression(),
        groups=groups
    ).fit(X, y)

    hr.estimators_
    ```

    ```terminal
    {
        (1,): LinearRegression(),  # global estimator
        (1, 'A'): LinearRegression(),  # estimator for `g_1 = 'A'`
        (1, 'B'): LinearRegression(),  # estimator for `g_1 = 'B'`
        (1, 'A', 'X'): LinearRegression(),  # estimator for `(g_1, g_2) = ('A', 'X`)`
        (1, 'A', 'Y'): LinearRegression(),  # estimator for `(g_1, g_2) = ('A', 'Y`)`
        (1, 'B', 'W'): LinearRegression(),  # estimator for `(g_1, g_2) = ('B', 'W`)`
        (1, 'B', 'Z'): LinearRegression(),  # estimator for `(g_1, g_2) = ('B', 'Z`)`
    }
    ```

    As we can see, the estimators are fitted for each level of the group columns. The trailing (1,) is the global
    estimator, which is fitted on the entire dataset.

    If we try to predict a sample in which `(g_1, g_2) = ('B', 'X')`, this will fallback to the estimator `(1, 'B')`.
    when `fallback_method="parent"` or will raise a KeyError when `fallback_method="raise"`.

    As one would expect, `estimator` can be a pipeline, and the pipeline will be fitted on each level of the group:
    ```py
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import StandardScaler

    hr = HierarchicalRegressor(
        estimator=Pipeline([
            ('scaler', StandardScaler()),
            ('model', LinearRegression())
            ]),
        groups=groups
    ).fit(X, y)
    ```
    """

    def fit(self, X, y):
        """Fit one regressor for each hierarchical group of training data `X` and `y`.

        Will also learn the groups that exist within the training dataset.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data.
        y : array-like of shape (n_samples,)
            Target values.

        Returns
        -------
        self : HierarchicalRegressor
            The fitted regressor.

        Raises
        -------
        ValueError
            If the supplied estimator is not a regressor.
        """
        if not is_regressor(self.estimator):
            raise ValueError("The supplied estimator should be a regressor")

        super().fit(X, y)
        return self

    def predict(self, X):
        """Predict regression values for new data `X`.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            The data to predict.

        Returns
        -------
        array-like of shape (n_samples,)
            Predicted regression values.
        """
        return self._predict_estimators(X, "predict")

fit(X, y)

Fit one regressor for each hierarchical group of training data X and y.

Will also learn the groups that exist within the training dataset.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Training data.

required
y array-like of shape (n_samples,)

Target values.

required

Returns:

Name Type Description
self HierarchicalRegressor

The fitted regressor.

Raises:

Type Description
ValueError

If the supplied estimator is not a regressor.

Source code in sklego/meta/hierarchical_predictor.py
def fit(self, X, y):
    """Fit one regressor for each hierarchical group of training data `X` and `y`.

    Will also learn the groups that exist within the training dataset.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Training data.
    y : array-like of shape (n_samples,)
        Target values.

    Returns
    -------
    self : HierarchicalRegressor
        The fitted regressor.

    Raises
    -------
    ValueError
        If the supplied estimator is not a regressor.
    """
    if not is_regressor(self.estimator):
        raise ValueError("The supplied estimator should be a regressor")

    super().fit(X, y)
    return self

predict(X)

Predict regression values for new data X.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

The data to predict.

required

Returns:

Type Description
array-like of shape (n_samples,)

Predicted regression values.

Source code in sklego/meta/hierarchical_predictor.py
def predict(self, X):
    """Predict regression values for new data `X`.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        The data to predict.

    Returns
    -------
    array-like of shape (n_samples,)
        Predicted regression values.
    """
    return self._predict_estimators(X, "predict")