Mixture Models¶
sklego.mixture.bayesian_gmm_classifier.BayesianGMMClassifier
¶
Bases: BaseEstimator
, ClassifierMixin
The BayesianGMMClassifier
trains a Gaussian Mixture Model for each class in y
on a dataset X
.
Once a density is trained for each class we can evaluate the likelihood scores to see which class is more likely.
Note
All the parameters are an exact copy of those of sklearn.mixture.BayesianGaussianMixture.
Attributes:
Name | Type | Description |
---|---|---|
gmms_ |
dict[int, BayesianGaussianMixture]
|
A dictionary of Bayesian Gaussian Mixture Models, one for each class. |
classes_ |
np.ndarray of shape (n_classes,)
|
The classes seen during |
Source code in sklego/mixture/bayesian_gmm_classifier.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
|
fit(X, y)
¶
Fit the BayesianGMMClassifier
model using X
, y
as training data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features )
|
The training data. |
required |
y
|
array-like of shape (n_samples,)
|
The target values. |
required |
Returns:
Name | Type | Description |
---|---|---|
self |
BayesianGMMClassifier
|
The fitted estimator. |
Source code in sklego/mixture/bayesian_gmm_classifier.py
predict(X)
¶
Predict labels for X
using fitted estimator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features)
|
The data to predict. |
required |
Returns:
Type | Description |
---|---|
array-like of shape (n_samples,)
|
The predicted data. |
Source code in sklego/mixture/bayesian_gmm_classifier.py
predict_proba(X)
¶
Predict probabilities for X
using fitted estimator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features)
|
The data to predict. |
required |
Returns:
Type | Description |
---|---|
array-like of shape (n_samples, n_classes)
|
The predicted probabilities. |
Source code in sklego/mixture/bayesian_gmm_classifier.py
sklego.mixture.bayesian_gmm_detector.BayesianGMMOutlierDetector
¶
Bases: OutlierMixin
, BaseEstimator
The BayesianGMMOutlierDetector
trains a Bayesian Gaussian Mixture model on a dataset X
. Once a density is
trained we can evaluate the likelihood scores to see if it is deemed likely.
By providing a threshold
this model might then label outliers if their likelihood score is too low.
Note
The parameters other than threshold
and method
are an exact copy of the parameters in
sklearn.mixture.BayesianGaussianMixture.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
threshold
|
float
|
The limit at which the model thinks an outlier appears, must be between (0, 1). |
0.99
|
method
|
Literal[quantile, stddev]
|
The method to use to apply the Info If you select If you select |
"quantile"
|
Attributes:
Name | Type | Description |
---|---|---|
gmm_ |
BayesianGaussianMixture
|
The trained Bayesian Gaussian Mixture Model. |
likelihood_threshold_ |
float
|
The threshold value used to determine if something is an outlier. |
Source code in sklego/mixture/bayesian_gmm_detector.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
fit(X, y=None)
¶
Fit the BayesianGMMOutlierDetector
model using X
, y
as training data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features )
|
The training data. |
required |
y
|
array-like of shape (n_samples,)
|
Ignored, present for compatibility. |
None
|
Returns:
Name | Type | Description |
---|---|---|
self |
BayesianGMMOutlierDetector
|
The fitted estimator. |
Raises:
Type | Description |
---|---|
ValueError
|
|
Source code in sklego/mixture/bayesian_gmm_detector.py
predict(X)
¶
Predict if a point is an outlier or not using the fitted estimator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features)
|
The data to predict. |
required |
Returns:
Type | Description |
---|---|
array-like of shape (n_samples,)
|
The predicted data. 1 for inliers, -1 for outliers. |
Source code in sklego/mixture/bayesian_gmm_detector.py
score_samples(X)
¶
Compute the log likelihood for each sample and return the negative value.
Source code in sklego/mixture/bayesian_gmm_detector.py
sklego.mixture.gmm_classifier.GMMClassifier
¶
Bases: BaseEstimator
, ClassifierMixin
The GMMClassifier
trains a Gaussian Mixture Model for each class in y
on a dataset X
. Once a density is
trained for each class we can evaluate the likelihood scores to see which class is more likely.
All parameters of the model are an exact copy of the parameters in scikit-learn.
Note
All the parameters are an exact copy of those of sklearn.mixture.GaussianMixture.
Attributes:
Name | Type | Description |
---|---|---|
gmms_ |
dict[int, GaussianMixture]
|
A dictionary of Gaussian Mixture Models, one for each class. |
classes_ |
np.ndarray of shape (n_classes,)
|
The classes seen during |
Source code in sklego/mixture/gmm_classifier.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
|
fit(X, y)
¶
Fit the GMMClassifier
model using X
, y
as training data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features )
|
The training data. |
required |
y
|
array-like of shape (n_samples,)
|
The target values. |
required |
Returns:
Name | Type | Description |
---|---|---|
self |
GMMClassifier
|
The fitted estimator. |
Source code in sklego/mixture/gmm_classifier.py
predict(X)
¶
Predict labels for X
using fitted estimator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features)
|
The data to predict. |
required |
Returns:
Type | Description |
---|---|
array-like of shape (n_samples,)
|
The predicted data. |
Source code in sklego/mixture/gmm_classifier.py
predict_proba(X)
¶
Predict probabilities for X
using fitted estimator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features)
|
The data to predict. |
required |
Returns:
Type | Description |
---|---|
array-like of shape (n_samples, n_classes)
|
The predicted probabilities. |
Source code in sklego/mixture/gmm_classifier.py
sklego.mixture.gmm_outlier_detector.GMMOutlierDetector
¶
Bases: OutlierMixin
, BaseEstimator
The GMMDetector
trains a Gaussian Mixture model on a dataset X
. Once a density is trained we can evaluate the
likelihood scores to see if it is deemed likely.
By providing a threshold
this model might then label outliers if their likelihood score is too low.
Note
The parameters other than threshold
and method
are an exact copy of the parameters in
sklearn.mixture.GaussianMixture.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
threshold
|
float
|
The limit at which the model thinks an outlier appears, must be between (0, 1). |
0.99
|
method
|
Literal[quantile, stddev]
|
The method to use to apply the Info If you select If you select |
"quantile"
|
Attributes:
Name | Type | Description |
---|---|---|
gmm_ |
GaussianMixture
|
The trained Gaussian Mixture model. |
likelihood_threshold_ |
float
|
The threshold value used to determine if something is an outlier. |
Source code in sklego/mixture/gmm_outlier_detector.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
|
fit(X, y=None)
¶
Fit the GMMOutlierDetector
model using X
, y
as training data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features )
|
The training data. |
required |
y
|
array-like of shape (n_samples,)
|
Ignored, present for compatibility. |
None
|
Returns:
Name | Type | Description |
---|---|---|
self |
GMMOutlierDetector
|
The fitted estimator. |
Raises:
Type | Description |
---|---|
ValueError
|
|
Source code in sklego/mixture/gmm_outlier_detector.py
predict(X)
¶
Predict if a point is an outlier or not using the fitted model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features)
|
The data to predict. |
required |
Returns:
Type | Description |
---|---|
array-like of shape (n_samples,)
|
The predicted data. 1 for inliers, -1 for outliers. |
Source code in sklego/mixture/gmm_outlier_detector.py
score_samples(X)
¶
Compute the log likelihood for each sample and return the negative value.