Mixture Models¶
sklego.mixture.bayesian_gmm_classifier.BayesianGMMClassifier
¶
Bases: ClassifierMixin
, BaseEstimator
The BayesianGMMClassifier
trains a Gaussian Mixture Model for each class in y
on a dataset X
.
Once a density is trained for each class we can evaluate the likelihood scores to see which class is more likely.
Note
All the parameters are an exact copy of those of sklearn.mixture.BayesianGaussianMixture.
Attributes:
Name | Type | Description |
---|---|---|
gmms_ |
dict[int, BayesianGaussianMixture]
|
A dictionary of Bayesian Gaussian Mixture Models, one for each class. |
classes_ |
np.ndarray of shape (n_classes,)
|
The classes seen during |
Examples:
import numpy as np
from sklego.mixture import BayesianGMMClassifier
# Generate datset
np.random.seed(1)
group0 = np.random.normal(0, 3, (1000, 2))
group1 = np.random.normal(2.5, 2, (500, 2))
data = np.vstack([group0, group1])
y = np.hstack([np.zeros((group0.shape[0],), dtype=int), np.ones((group1.shape[0],), dtype=int)])
# Create and fit the BayesianGMMClassifier model
bgmm = BayesianGMMClassifier(n_components=2, random_state=1)
bgmm.fit(data, y)
# Classify the train dataset into two clusters (n_components=2)
labels = bgmm.predict(data)
# Classify a new point into one of two clusters
p = np.array([[1.5, 0.5]])
p_prob = bgmm.predict_proba(p) # predict the probabilities p belongs to each cluster
print(f'Probability point p belongs to group1 is {p_prob[0,0]:.2f}')
### Probability point p belongs to group1 is 0.38
print(f'Probability point p belongs to group2 is {p_prob[0,1]:.2f}')
### Probability point p belongs to group2 is 0.62
print(f'It is more probable that point p belongs to group{np.argmax(p_prob)}')
### It is more probable that point p belongs to group1
Source code in sklego/mixture/bayesian_gmm_classifier.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
|
fit(X, y)
¶
Fit the BayesianGMMClassifier
model using X
, y
as training data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features )
|
The training data. |
required |
y
|
array-like of shape (n_samples,)
|
The target values. |
required |
Returns:
Name | Type | Description |
---|---|---|
self |
BayesianGMMClassifier
|
The fitted estimator. |
Source code in sklego/mixture/bayesian_gmm_classifier.py
predict(X)
¶
Predict labels for X
using fitted estimator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features)
|
The data to predict. |
required |
Returns:
Type | Description |
---|---|
array-like of shape (n_samples,)
|
The predicted data. |
Source code in sklego/mixture/bayesian_gmm_classifier.py
predict_proba(X)
¶
Predict probabilities for X
using fitted estimator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features)
|
The data to predict. |
required |
Returns:
Type | Description |
---|---|
array-like of shape (n_samples, n_classes)
|
The predicted probabilities. |
Source code in sklego/mixture/bayesian_gmm_classifier.py
sklego.mixture.bayesian_gmm_detector.BayesianGMMOutlierDetector
¶
Bases: OutlierMixin
, BaseEstimator
The BayesianGMMOutlierDetector
trains a Bayesian Gaussian Mixture model on a dataset X
. Once a density is
trained we can evaluate the likelihood scores to see if it is deemed likely.
By providing a threshold
this model might then label outliers if their likelihood score is too low.
Note
The parameters other than threshold
and method
are an exact copy of the parameters in
sklearn.mixture.BayesianGaussianMixture.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
threshold
|
float
|
The limit at which the model thinks an outlier appears, must be between (0, 1). |
0.99
|
method
|
Literal[quantile, stddev]
|
The method to use to apply the Info If you select If you select |
"quantile"
|
Attributes:
Name | Type | Description |
---|---|---|
gmm_ |
BayesianGaussianMixture
|
The trained Bayesian Gaussian Mixture Model. |
likelihood_threshold_ |
float
|
The threshold value used to determine if something is an outlier. |
Examples:
import numpy as np
from sklego.mixture import BayesianGMMOutlierDetector
# Generate datset, it consists of two clusters
np.random.seed(1)
group0 = np.random.normal(0, 3, (10, 2))
group1 = np.random.normal(2.5, 2, (5, 2))
data = np.vstack([group0, group1])
y = np.hstack([np.zeros((group0.shape[0],), dtype=int), np.ones((group1.shape[0],), dtype=int)])
# Create and fit the BayesianGMMOutlierDetector model with threshold=0.9
bgmm = BayesianGMMOutlierDetector(threshold=0.9, n_components=2, random_state=1)
bgmm.fit(data, y)
# Classify a new point as outlier or not
p = np.array([[4.5, 0.5]])
p_pred = bgmm.predict(p) # predict the probabilities p belongs to each cluster
print('The point is an outlier if the score is -1, inlier if the score is 1')
### The point is an outlier if the score is -1, inlier if the score is 1
print(f'The score for this point is {p_pred}.')
### The score for this point is [1].
Source code in sklego/mixture/bayesian_gmm_detector.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 |
|
fit(X, y=None)
¶
Fit the BayesianGMMOutlierDetector
model using X
, y
as training data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features )
|
The training data. |
required |
y
|
array-like of shape (n_samples,)
|
Ignored, present for compatibility. |
None
|
Returns:
Name | Type | Description |
---|---|---|
self |
BayesianGMMOutlierDetector
|
The fitted estimator. |
Raises:
Type | Description |
---|---|
ValueError
|
|
Source code in sklego/mixture/bayesian_gmm_detector.py
predict(X)
¶
Predict if a point is an outlier or not using the fitted estimator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features)
|
The data to predict. |
required |
Returns:
Type | Description |
---|---|
array-like of shape (n_samples,)
|
The predicted data. 1 for inliers, -1 for outliers. |
Source code in sklego/mixture/bayesian_gmm_detector.py
score_samples(X)
¶
Compute the log likelihood for each sample and return the negative value.
Source code in sklego/mixture/bayesian_gmm_detector.py
sklego.mixture.gmm_classifier.GMMClassifier
¶
Bases: ClassifierMixin
, BaseEstimator
The GMMClassifier
trains a Gaussian Mixture Model for each class in y
on a dataset X
. Once a density is
trained for each class we can evaluate the likelihood scores to see which class is more likely.
All parameters of the model are an exact copy of the parameters in scikit-learn.
Note
All the parameters are an exact copy of those of sklearn.mixture.GaussianMixture.
Attributes:
Name | Type | Description |
---|---|---|
gmms_ |
dict[int, GaussianMixture]
|
A dictionary of Gaussian Mixture Models, one for each class. |
classes_ |
np.ndarray of shape (n_classes,)
|
The classes seen during |
Examples:
import numpy as np
from sklego.mixture import GMMClassifier
# Generate datset
np.random.seed(1)
group0 = np.random.normal(0, 3, (1000, 2))
group1 = np.random.normal(2.5, 2, (500, 2))
data = np.vstack([group0, group1])
y = np.hstack([np.zeros((group0.shape[0],), dtype=int), np.ones((group1.shape[0],), dtype=int)])
# Create and fit the GMMClassifier model
gmm = GMMClassifier(n_components=2, random_state=1)
gmm.fit(data, y)
# Classify the train dataset into two clusters (n_components=2)
labels = gmm.predict(data)
# Classify a new point into one of two clusters
p = np.array([[1.5, 0.5]])
p_prob = gmm.predict_proba(p) # predict the probabilities p belongs to each cluster
print(f'Probability point p belongs to group1 is {p_prob[0,0]:.2f}')
### Probability point p belongs to group1 is 0.41
print(f'Probability point p belongs to group2 is {p_prob[0,1]:.2f}')
### Probability point p belongs to group2 is 0.59
print(f'It is more probable that point p belongs to group{np.argmax(p_prob)}')
### It is more probable that point p belongs to group1
Source code in sklego/mixture/gmm_classifier.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
|
fit(X, y)
¶
Fit the GMMClassifier
model using X
, y
as training data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features )
|
The training data. |
required |
y
|
array-like of shape (n_samples,)
|
The target values. |
required |
Returns:
Name | Type | Description |
---|---|---|
self |
GMMClassifier
|
The fitted estimator. |
Source code in sklego/mixture/gmm_classifier.py
predict(X)
¶
Predict labels for X
using fitted estimator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features)
|
The data to predict. |
required |
Returns:
Type | Description |
---|---|
array-like of shape (n_samples,)
|
The predicted data. |
Source code in sklego/mixture/gmm_classifier.py
predict_proba(X)
¶
Predict probabilities for X
using fitted estimator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features)
|
The data to predict. |
required |
Returns:
Type | Description |
---|---|
array-like of shape (n_samples, n_classes)
|
The predicted probabilities. |
Source code in sklego/mixture/gmm_classifier.py
sklego.mixture.gmm_outlier_detector.GMMOutlierDetector
¶
Bases: OutlierMixin
, BaseEstimator
The GMMDetector
trains a Gaussian Mixture model on a dataset X
. Once a density is trained we can evaluate the
likelihood scores to see if it is deemed likely.
By providing a threshold
this model might then label outliers if their likelihood score is too low.
Note
The parameters other than threshold
and method
are an exact copy of the parameters in
sklearn.mixture.GaussianMixture.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
threshold
|
float
|
The limit at which the model thinks an outlier appears, must be between (0, 1). |
0.99
|
method
|
Literal[quantile, stddev]
|
The method to use to apply the Info If you select If you select |
"quantile"
|
Attributes:
Name | Type | Description |
---|---|---|
gmm_ |
GaussianMixture
|
The trained Gaussian Mixture model. |
likelihood_threshold_ |
float
|
The threshold value used to determine if something is an outlier. |
Examples:
import numpy as np
from sklego.mixture import GMMOutlierDetector
# Generate datset, it consists of two clusters
np.random.seed(1)
group0 = np.random.normal(0, 3, (10, 2))
group1 = np.random.normal(2.5, 2, (5, 2))
data = np.vstack([group0, group1])
y = np.hstack([np.zeros((group0.shape[0],), dtype=int), np.ones((group1.shape[0],), dtype=int)])
# Create and fit the GMMOutlierDetector model
gmm = GMMOutlierDetector(threshold=0.9, n_components=2, random_state=1)
gmm.fit(data, y)
# Classify a new point as outlier or not
p = np.array([[4.5, 0.5]])
p_pred = gmm.predict(p) # predict the probabilities p belongs to each cluster
print('The point is an outlier if the score is -1, inlier if the score is 1')
### The point is an outlier if the score is -1, inlier if the score is 1
print(f'The score for this point is {p_pred}.')
### The score for this point is [-1].
Source code in sklego/mixture/gmm_outlier_detector.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 |
|
fit(X, y=None)
¶
Fit the GMMOutlierDetector
model using X
, y
as training data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features )
|
The training data. |
required |
y
|
array-like of shape (n_samples,)
|
Ignored, present for compatibility. |
None
|
Returns:
Name | Type | Description |
---|---|---|
self |
GMMOutlierDetector
|
The fitted estimator. |
Raises:
Type | Description |
---|---|
ValueError
|
|
Source code in sklego/mixture/gmm_outlier_detector.py
predict(X)
¶
Predict if a point is an outlier or not using the fitted model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features)
|
The data to predict. |
required |
Returns:
Type | Description |
---|---|
array-like of shape (n_samples,)
|
The predicted data. 1 for inliers, -1 for outliers. |
Source code in sklego/mixture/gmm_outlier_detector.py
score_samples(X)
¶
Compute the log likelihood for each sample and return the negative value.