External
OpenAIEncoder
Bases: EmbetterBase
Encoder that can numerically encode sentences.
Note that this is an external embedding provider. If their API breaks, so will this component. We also assume that you've already importen openai upfront and ran this command:
This encoder will require the OPENAI_ORG
and OPENAI_KEY
environment variables to be set.
If you have it defined in your .env
file, you can use python-dotenv to load it.
You also need to install the openai
library beforehand.
python -m pip install openai
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
name of model, can be "small" or "large" |
'text-embedding-ada-002'
|
|
batch_size |
Batch size to send to OpenAI. |
25
|
Usage:
import pandas as pd
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from embetter.grab import ColumnGrabber
from embetter.external import OpenAIEncoder
from dotenv import load_dotenv
load_dotenv() # take environment variables from .env.
# Let's suppose this is the input dataframe
dataf = pd.DataFrame({
"text": ["positive sentiment", "super negative"],
"label_col": ["pos", "neg"]
})
# This pipeline grabs the `text` column from a dataframe
# which then get fed into OpenAI's endpoint
text_emb_pipeline = make_pipeline(
ColumnGrabber("text"),
OpenAIEncoder()
)
X = text_emb_pipeline.fit_transform(dataf, dataf['label_col'])
# This pipeline can also be trained to make predictions, using
# the embedded features.
text_clf_pipeline = make_pipeline(
text_emb_pipeline,
LogisticRegression()
)
# Prediction example
text_clf_pipeline.fit(dataf, dataf['label_col']).predict(dataf)
Source code in embetter/external/_openai.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
|
CohereEncoder
Bases: EmbetterBase
Encoder that can numerically encode sentences.
Note that this is an external embedding provider. If their API breaks, so will this component.
This encoder will require the COHERE_KEY
environment variable to be set.
If you have it defined in your .env
file, you can use python-dotenv to load it.
You also need to install the cohere
library beforehand.
python -m pip install cohere
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
name of model, can be "small" or "large" |
'large'
|
Usage:
import pandas as pd
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from embetter.grab import ColumnGrabber
from embetter.external import CohereEncoder
from dotenv import load_dotenv
load_dotenv() # take environment variables from .env.
# Let's suppose this is the input dataframe
dataf = pd.DataFrame({
"text": ["positive sentiment", "super negative"],
"label_col": ["pos", "neg"]
})
# This pipeline grabs the `text` column from a dataframe
# which then get fed into Cohere's endpoint
text_emb_pipeline = make_pipeline(
ColumnGrabber("text"),
CohereEncoder(model="large")
)
X = text_emb_pipeline.fit_transform(dataf, dataf['label_col'])
# This pipeline can also be trained to make predictions, using
# the embedded features.
text_clf_pipeline = make_pipeline(
text_emb_pipeline,
LogisticRegression()
)
# Prediction example
text_clf_pipeline.fit(dataf, dataf['label_col']).predict(dataf)
Source code in embetter/external/_cohere.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|