Common¶
Module with common classes and functions used across the package.
sklego.common.TrainOnlyTransformerMixin
¶
Bases: TransformerMixin
Mixin class for transformers that can handle training and test data differently.
This mixin allows using a separate function for transforming training and test data.
Warning
Transformers using this class as a mixin should:
- Call
super().fit
in their fit method. - Implement
transform_train()
method.
They may also implement transform_test()
method, if not implemented, transform_test()
will simply return
the untransformed data.
Attributes:
Name | Type | Description |
---|---|---|
X_hash_ |
hash
|
The hash of the training data - used to determine whether to use |
n_features_in_ |
int
|
The number of features seen during |
dim_ |
int
|
Deprecated, use |
Examples:
from sklearn.base import BaseEstimator
from sklego.common import TrainOnlyTransformerMixin
class TrainOnlyTransformer(TrainOnlyTransformerMixin, BaseEstimator):
def fit(self, X, y):
super().fit(X, y)
def transform_train(self, X, y=None):
'''Add random noise to the training data.'''
return X + np.random.normal(0, 1, size=X.shape)
X_train, X_test = np.random.randn(100, 4), np.random.randn(100, 4)
y_train, y_test = np.random.randn(100), np.random.randn(100)
trf = TrainOnlyTransformer()
trf.fit(X_train, y_train)
assert np.all(trf.transform(X_train) != X_train)
assert np.all(trf.transform(X_test) == X_test)
Source code in sklego/common.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 |
|
fit(X, y=None)
¶
Fit the mixin by calculating the hash of X
and stores it in self.X_hash_
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features )
|
The training data. |
required |
y
|
array-like of shape (n_samples,) | None
|
The target values. |
None
|
Returns:
Name | Type | Description |
---|---|---|
self |
TrainOnlyTransformerMixin
|
The fitted transformer. |
Source code in sklego/common.py
transform(X, y=None)
¶
Dispatch to transform_train()
or transform_test()
based on the data passed.
This method will check whether the hash of X
matches the hash of the training data. If it does, it will
dispatch to transform_train()
, otherwise it will dispatch to transform_test()
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features)
|
The data to transform. |
required |
y
|
array-like of shape (n_samples,) or None
|
The target values. |
None.
|
Returns:
Type | Description |
---|---|
array-like of shape (n_samples, n_features)
|
The transformed data. |
Raises:
Type | Description |
---|---|
ValueError
|
If the input dimension does not match the training dimension. |
Source code in sklego/common.py
transform_test(X, y=None)
¶
Transform the test data.
This method can be implemented in subclasses to specify how test data should be transformed. If not implemented, it will return the untransformed data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features)
|
The test data. |
required |
y
|
array-like of shape (n_samples,) or None
|
The target values. |
None
|
Returns:
Type | Description |
---|---|
array-like of shape (n_samples, n_features)
|
The transformed test data or untransformed data if not implemented. |
Source code in sklego/common.py
transform_train(X, y=None)
¶
Transform the training data.
This method should be implemented in subclasses to specify how training data should be transformed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array-like of shape (n_samples, n_features )
|
The training data. |
required |
y
|
array-like of shape (n_samples,) or None
|
The target values. |
None
|
Returns:
Type | Description |
---|---|
array-like of shape (n_samples, n_features)
|
The transformed training data. |
Source code in sklego/common.py
sklego.common.as_list(val)
¶
Ensure the input value is converted into a list.
This helper function takes an input value and ensures that it is always returned as a list.
- If the input is a single value, it will be wrapped in a list.
- If the input is an iterable, it will be converted into a list.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
val
|
object
|
The input value that needs to be converted into a list. |
required |
Returns:
Type | Description |
---|---|
list
|
The input value as a list. |
Examples:
Source code in sklego/common.py
sklego.common.flatten(nested_iterable)
¶
Recursively flatten an arbitrarily nested iterable into an iterator of values.
This helper function takes an arbitrarily nested iterable and returns an iterator of flattened values. It recursively processes the input to extract individual elements and yield them in a flat structure.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
nested_iterable
|
Iterable
|
An arbitrarily nested iterable to be flattened. |
required |
Yields:
Type | Description |
---|---|
Generator
|
A generator of flattened values from the nested iterable. |
Examples:
list(flatten([["test1", "test2"], ["a", "b", ["c", "d"]]))
# ['test1', 'test2', 'a', 'b', 'c', 'd']
list(flatten(["test1", ["test2"]])
# ['test1', 'test2']
Source code in sklego/common.py
sklego.common.expanding_list(list_to_extent, return_type=list)
¶
Create an expanding list of lists or tuples by making combinations of elements.
This function takes an input list and creates an expanding list, where each element is a list or tuple containing a
subset of elements from the input list. The resulting list can be composed of lists or tuples, depending on the
specified return_type
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
list_to_extent
|
object
|
The input to be extended. |
required |
return_type
|
type
|
The type of elements in the resulting list (list or tuple). |
list
|
Returns:
Type | Description |
---|---|
list
|
An expanding list of |
Examples:
expanding_list("test")
# [['test']]
expanding_list(["test1", "test2", "test3"])
# [['test1'], ['test1', 'test2'], ['test1', 'test2', 'test3']]
expanding_list(["test1", "test2", "test3"], tuple)
# [('test1',), ('test1', 'test2'), ('test1', 'test2', 'test3')]
Source code in sklego/common.py
sklego.common.sliding_window(sequence, window_size, step_size)
¶
Generate sliding windows over a sequence.
This function generates sliding windows of a specified size over a given sequence, where each window is a list of elements from the sequence.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sequence
|
Iterable
|
The input sequence (e.g., a list). |
required |
window_size
|
int
|
The size of each sliding window. |
required |
step_size
|
int
|
The amount of steps to the next window. |
required |
Returns:
Type | Description |
---|---|
Generator
|
A generator object that yields sliding windows. |
Examples: