ColumnGrabber

Component that can grab a pandas column as a list.

This can be useful when dealing with text encoders as these sometimes cannot deal with pandas columns.

Parameters

Name Type Description Default
colname str the column name to grab from a dataframe required

Usage

In essense, the ColumnGrabber really just selects a single column.

import pandas as pd
from embetter.grab import ColumnGrabber

# Let's say we start we start with a csv file with filepaths
data = {"filepaths":  ["tests/data/thiscatdoesnotexist.jpeg"]}
df = pd.DataFrame(data)

# You can use the component in stand-alone fashion
ColumnGrabber("filepaths").fit_transform(df)

But the most common way to use the ColumnGrabber is part of a pipeline.

import pandas as pd
from sklearn.pipeline import make_pipeline

from embetter.grab import ColumnGrabber
from embetter.vision import ImageLoader, ColorHistogramEncoder

# Let's say we start we start with a csv file with filepaths
data = {"filepaths":  ["tests/data/thiscatdoesnotexist.jpeg"]}
df = pd.DataFrame(data)

# You can use the component in stand-alone fashion
ColumnGrabber("filepaths").fit_transform(df)

# But let's build a pipeline that grabs the column, turns it
# into an image and embeds it.
pipe = make_pipeline(
    ColumnGrabber("filepaths"),
    ImageLoader(),
    ColorHistogramEncoder()
)

pipe.fit_transform(df)

transform(self, X, y=None)

Show source code in embetter/grab.py
63
64
65
66
67
    def transform(self, X, y=None):
        """
        Takes a column from pandas and returns it as a list.
        """
        return [x for x in X[self.colname]]

Takes a column from pandas and returns it as a list.