ColumnGrabber¶
Component that can grab a pandas column as a list.
This can be useful when dealing with text encoders as these sometimes cannot deal with pandas columns.
Parameters
Name | Type | Description | Default |
---|---|---|---|
colname |
str |
the column name to grab from a dataframe | required |
Usage
In essense, the ColumnGrabber
really just selects a single column.
import pandas as pd
from embetter.grab import ColumnGrabber
# Let's say we start we start with a csv file with filepaths
data = {"filepaths": ["tests/data/thiscatdoesnotexist.jpeg"]}
df = pd.DataFrame(data)
# You can use the component in stand-alone fashion
ColumnGrabber("filepaths").fit_transform(df)
But the most common way to use the ColumnGrabber
is part of a pipeline.
import pandas as pd
from sklearn.pipeline import make_pipeline
from embetter.grab import ColumnGrabber
from embetter.vision import ImageLoader, ColorHistogramEncoder
# Let's say we start we start with a csv file with filepaths
data = {"filepaths": ["tests/data/thiscatdoesnotexist.jpeg"]}
df = pd.DataFrame(data)
# You can use the component in stand-alone fashion
ColumnGrabber("filepaths").fit_transform(df)
# But let's build a pipeline that grabs the column, turns it
# into an image and embeds it.
pipe = make_pipeline(
ColumnGrabber("filepaths"),
ImageLoader(),
ColorHistogramEncoder()
)
pipe.fit_transform(df)
transform(self, X, y=None)
¶
Show source code in embetter/grab.py
63 64 65 66 67 |
|
Takes a column from pandas and returns it as a list.