CohereEncoder

embetter.external.CohereEncoder

Encoder that can numerically encode sentences.

Note that this is an external embedding provider. If their API breaks, so will this component.

Parameters

Name Type Description Default
client cohere client with key required
model name of model, can be "small" or "large" 'large'

Usage:

import pandas as pd
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression

from cohere import Client
from embetter.grab import ColumnGrabber
from embetter.external import CohereEncoder

client = Client("APIKEY")
# Let's suppose this is the input dataframe
dataf = pd.DataFrame({
    "text": ["positive sentiment", "super negative"],
    "label_col": ["pos", "neg"]
})

# This pipeline grabs the `text` column from a dataframe
# which then get fed into Cohere's endpoint
text_emb_pipeline = make_pipeline(
    ColumnGrabber("text"),
    CohereEncoder(client=client, model="large")
)
X = text_emb_pipeline.fit_transform(dataf, dataf['label_col'])

# This pipeline can also be trained to make predictions, using
# the embedded features.
text_clf_pipeline = make_pipeline(
    text_emb_pipeline,
    LogisticRegression()
)

# Prediction example
text_clf_pipeline.fit(dataf, dataf['label_col']).predict(dataf)

transform(self, X, y=None)

Show source code in external/_cohere.py
64
65
66
67
68
69
70
    def transform(self, X, y=None):
        """Transforms the text into a numeric representation."""
        result = []
        for b in _batch(X, 10):
            response = self.client.embed(b)
            result.extend(response.embeddings)
        return np.array(result)

Transforms the text into a numeric representation.