OpenAIEncoder¶
embetter.external.OpenAIEncoder
¶
Encoder that can numerically encode sentences.
Note that this is an external embedding provider. If their API breaks, so will this component. We also assume that you've already importen openai upfront and ran this command:
import openai
openai.organization = OPENAI_ORG
openai.api_key = OPENAI_KEY
Parameters
Name | Type | Description | Default |
---|---|---|---|
model |
name of model, can be "small" or "large" | 'text-embedding-ada-002' |
|
batch_size |
Batch size to send to OpenAI. | 25 |
Usage:
import pandas as pd
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from embetter.grab import ColumnGrabber
from embetter.external import CohereEncoder
import openai
# You must run this first!
openai.organization = OPENAI_ORG
openai.api_key = OPENAI_KEY
# Let's suppose this is the input dataframe
dataf = pd.DataFrame({
"text": ["positive sentiment", "super negative"],
"label_col": ["pos", "neg"]
})
# This pipeline grabs the `text` column from a dataframe
# which then get fed into Cohere's endpoint
text_emb_pipeline = make_pipeline(
ColumnGrabber("text"),
OpenAIEncoder()
)
X = text_emb_pipeline.fit_transform(dataf, dataf['label_col'])
# This pipeline can also be trained to make predictions, using
# the embedded features.
text_clf_pipeline = make_pipeline(
text_emb_pipeline,
LogisticRegression()
)
# Prediction example
text_clf_pipeline.fit(dataf, dataf['label_col']).predict(dataf)
transform(self, X, y=None)
¶
Show source code in external/_openai.py
77 78 79 80 81 82 83 |
|
Transforms the text into a numeric representation.