TimmEncoder

Use a pretrained vision model from TorchVision to generate embeddings. Embeddings are provider via the lovely timm library.

You can find a list of available models here.

Parameters

Name Type Description Default
name name of the model to use 'mobilenetv3_large_100'
encode_predictions output the predictions instead of the pooled embedding layer before False

Usage:

import pandas as pd
from sklearn.pipeline import make_pipeline

from embetter.grab import ColumnGrabber
from embetter.vision import ImageLoader, TimmEncoder

# Let's say we start we start with a csv file with filepaths
data = {"filepaths":  ["tests/data/thiscatdoesnotexist.jpeg"]}
df = pd.DataFrame(data)

# Let's build a pipeline that grabs the column, turns it
# into an image and embeds it.
pipe = make_pipeline(
    ColumnGrabber("filepaths"),
    ImageLoader(),
    TimmEncoder(name="mobilenetv3_large_100")
)

# This pipeline can now encode each image in the dataframe
pipe.fit_transform(df)

transform(self, X, y=None)

Show source code in vision/_torchvis.py
55
56
57
58
59
60
    def transform(self, X, y=None):
        """
        Transforms grabbed images into numeric representations.
        """
        batch = [self.transform_img(x).unsqueeze(0) for x in X]
        return np.array([self.model(x).squeeze(0).detach().numpy() for x in batch])

Transforms grabbed images into numeric representations.