from hulearn.outlier import *

FunctionOutlierDetector

This class allows you to pass a function to detect outliers you're interested in. Note that the output of the function needs to be an array with [-1, 1] values (-1 denotes outliers).

Parameters

Name Type Description Default
func the function that return an array of True/False required
**kwargs extra keyword arguments will be pass to the function, can be grid-search-able {}

The functions that are passed need to be pickle-able. That means no lambda functions!

fit(self, X, y=None)

Show source code in outlier/functionoutlier.py
21
22
23
24
25
26
27
28
    def fit(self, X, y=None):
        """
        Fit the classifier. No-Op.
        """
        # Run it to confirm no error happened.
        self.fitted_ = True
        _ = self.func(X, **self.kwargs)
        return self

Fit the classifier. No-Op.

partial_fit(self, X, y=None)

Show source code in outlier/functionoutlier.py
30
31
32
33
34
35
36
37
38
    def partial_fit(self, X, y=None):
        """
        Fit the classifier partially. No-Op.
        """
        # Run it to confirm no error happened.
        _ = self.func(X, **self.kwargs)
        self.fitted_ = True
        self.ncol_ = 0 if len(X.shape) == 1 else X.shape[1]
        return self

Fit the classifier partially. No-Op.

predict(self, X)

Show source code in outlier/functionoutlier.py
40
41
42
43
44
45
    def predict(self, X):
        """
        Make predictions using the passed function.
        """
        check_is_fitted(self, ["fitted_"])
        return self.func(X, **self.kwargs)

Make predictions using the passed function.

InteractiveOutlierDetector

This tool allows you to take a drawn model and use it as an outlier detector. If a datapoint does not fit in any of the drawn polygons it becomes a candidate to become an outlier.

Parameters

Name Type Description Default
json_desc python dictionary that contains drawn data required
threshold the minimum number of polygons a point needs to be in to not be considered an outlier 1

Usage:

from sklego.datasets import load_penguins
from hulearn.experimental.interactive import InteractiveCharts

df = load_penguins(as_frame=True)
charts = InteractiveCharts(df, labels="species")

# Next notebook cell
charts.add_chart(x="bill_length_mm", y="bill_depth_mm")
# Next notebook cell
charts.add_chart(x="flipper_length_mm", y="body_mass_g")

# After drawing a model, export the data
json_data = charts.data()

# You can now use your drawn intuition as a model!
from hulearn.outlier import InteractiveOutlierDetector
clf = InteractiveOutlierDetector(clf_data)
X, y = df.drop(columns=['species']), df['species']

# This doesn't do anything. But scikit-learn demands it.
clf.fit(X, y)

# This makes predictions, based on your drawn model.
# It can also be used in `GridSearchCV` for benchmarking!
clf.predict(X)

fit(self, X, y=None)

Show source code in outlier/interactiveoutlier.py
105
106
107
108
109
110
    def fit(self, X, y=None):
        """
        Fit the classifier. Bit of a formality, it's not doing anything specifically.
        """
        self.classes_ = list(self.json_desc[0]["polygons"].keys())
        return self

Fit the classifier. Bit of a formality, it's not doing anything specifically.

from_json(path, threshold=1) (classmethod)

Show source code in outlier/interactiveoutlier.py
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
    @classmethod
    def from_json(cls, path, threshold=1):
        """
        Load the classifier from json stored on disk.

        Arguments:
            path: path of the json file
            threshold: the minimum number of polygons a point needs to be in to not be considered an outlier

        Usage:

        ```python
        from hulearn.outlier import InteractiveOutlierDetector

        InteractiveOutlierDetector.from_json("path/to/file.json")
        ```
        """
        json_desc = json.loads(pathlib.Path(path).read_text())
        return InteractiveOutlierDetector(json_desc=json_desc, threshold=threshold)

Load the classifier from json stored on disk.

Parameters

Name Type Description Default
path path of the json file required
threshold the minimum number of polygons a point needs to be in to not be considered an outlier 1

Usage:

from hulearn.outlier import InteractiveOutlierDetector

InteractiveOutlierDetector.from_json("path/to/file.json")

predict(self, X)

Show source code in outlier/interactiveoutlier.py
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
    def predict(self, X):
        """
        Predicts the associated probabilities for each class.

        Usage:

        ```python
        from hulearn.drawing-classifier.interactive import InteractiveOutlierDetector
        # Assuming a variable `clf_data` that contains the drawn polygons.
        clf = InteractiveOutlierDetector(clf_data)
        X, y = load_data(...)

        # This doesn't do anything. But scikit-learn demands it.
        clf.fit(X, y)

        # This makes predictions, based on your drawn model.
        clf.predict_proba(X)
        ```
        """
        count_arr = self.score(X)
        return np.where(count_arr.sum(axis=1) < self.threshold, -1, 1)

Predicts the associated probabilities for each class.

Usage:

from hulearn.drawing-classifier.interactive import InteractiveOutlierDetector
# Assuming a variable `clf_data` that contains the drawn polygons.
clf = InteractiveOutlierDetector(clf_data)
X, y = load_data(...)

# This doesn't do anything. But scikit-learn demands it.
clf.fit(X, y)

# This makes predictions, based on your drawn model.
clf.predict_proba(X)