Skip to content

Grabbers

ColumnGrabber

Bases: EmbetterBase

Component that can grab a pandas column as a list.

This can be useful when dealing with text encoders as these sometimes cannot deal with pandas columns.

Parameters:

Name Type Description Default
colname str

the column name to grab from a dataframe

required

Usage

In essense, the ColumnGrabber really just selects a single column.

import pandas as pd
from embetter.grab import ColumnGrabber

# Let's say we start we start with a csv file with filepaths
data = {"filepaths":  ["tests/data/thiscatdoesnotexist.jpeg"]}
df = pd.DataFrame(data)

# You can use the component in stand-alone fashion
ColumnGrabber("filepaths").fit_transform(df)

But the most common way to use the ColumnGrabber is part of a pipeline.

import pandas as pd
from sklearn.pipeline import make_pipeline

from embetter.grab import ColumnGrabber
from embetter.vision import ImageLoader, ColorHistogramEncoder

# Let's say we start we start with a csv file with filepaths
data = {"filepaths":  ["tests/data/thiscatdoesnotexist.jpeg"]}
df = pd.DataFrame(data)

# You can use the component in stand-alone fashion
ColumnGrabber("filepaths").fit_transform(df)

# But let's build a pipeline that grabs the column, turns it
# into an image and embeds it.
pipe = make_pipeline(
    ColumnGrabber("filepaths"),
    ImageLoader(),
    ColorHistogramEncoder()
)

pipe.fit_transform(df)
Source code in embetter/grab.py
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
class ColumnGrabber(EmbetterBase):
    """
    Component that can grab a pandas column as a list.

    ![](https://raw.githubusercontent.com/koaning/embetter/main/docs/images/columngrabber.png)

    This can be useful when dealing with text encoders as these
    sometimes cannot deal with pandas columns.

    Arguments:
        colname: the column name to grab from a dataframe

    **Usage**

    In essense, the `ColumnGrabber` really just selects a single column.

    ```python
    import pandas as pd
    from embetter.grab import ColumnGrabber

    # Let's say we start we start with a csv file with filepaths
    data = {"filepaths":  ["tests/data/thiscatdoesnotexist.jpeg"]}
    df = pd.DataFrame(data)

    # You can use the component in stand-alone fashion
    ColumnGrabber("filepaths").fit_transform(df)
    ```

    But the most common way to use the `ColumnGrabber` is part of a pipeline.

    ```python
    import pandas as pd
    from sklearn.pipeline import make_pipeline

    from embetter.grab import ColumnGrabber
    from embetter.vision import ImageLoader, ColorHistogramEncoder

    # Let's say we start we start with a csv file with filepaths
    data = {"filepaths":  ["tests/data/thiscatdoesnotexist.jpeg"]}
    df = pd.DataFrame(data)

    # You can use the component in stand-alone fashion
    ColumnGrabber("filepaths").fit_transform(df)

    # But let's build a pipeline that grabs the column, turns it
    # into an image and embeds it.
    pipe = make_pipeline(
        ColumnGrabber("filepaths"),
        ImageLoader(),
        ColorHistogramEncoder()
    )

    pipe.fit_transform(df)
    ```
    """

    def __init__(self, colname: str) -> None:
        self.colname = colname

    def transform(self, X, y=None):
        """
        Takes a column from pandas and returns it as a list.
        """
        return [x for x in X[self.colname]]

transform(X, y=None)

Takes a column from pandas and returns it as a list.

Source code in embetter/grab.py
63
64
65
66
67
def transform(self, X, y=None):
    """
    Takes a column from pandas and returns it as a list.
    """
    return [x for x in X[self.colname]]

KeyGrabber

Effectively the same thing as the ColumnGrabber, except this is meant to work on generators of dictionaries instead of dataframes.

Source code in embetter/grab.py
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
class KeyGrabber:
    """
    Effectively the same thing as the ColumnGrabber, except this is
    meant to work on generators of dictionaries instead of dataframes.
    """

    def __init__(self, colname: str) -> None:
        self.colname = colname

    def transform(self, X, y=None):
        """
        Takes a column from pandas and returns it as a list.
        """
        if isinstance(X, dict):
            return X[self.colname]
        return [x[self.colname] for x in X]

transform(X, y=None)

Takes a column from pandas and returns it as a list.

Source code in embetter/grab.py
79
80
81
82
83
84
85
def transform(self, X, y=None):
    """
    Takes a column from pandas and returns it as a list.
    """
    if isinstance(X, dict):
        return X[self.colname]
    return [x[self.colname] for x in X]