External
OpenAIEncoder
Bases: EmbetterBase
Encoder that can numerically encode sentences.
Note that this is an external embedding provider. If their API breaks, so will this component. We also assume that you've already importen openai upfront and ran this command:
This encoder will require the OPENAI_API_KEY
(optionally OPENAI_ORG_ID
and OPENAI_PROJECT_ID
) environment variable to be set.
If you have it defined in your .env
file, you can use python-dotenv to load it.
You also need to install the openai
library beforehand.
python -m pip install openai
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
name of model, can be "small" or "large" |
'text-embedding-ada-002'
|
|
batch_size |
Batch size to send to OpenAI. |
25
|
Usage:
import pandas as pd
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from embetter.grab import ColumnGrabber
from embetter.external import OpenAIEncoder
from dotenv import load_dotenv
load_dotenv() # take environment variables from .env.
# Let's suppose this is the input dataframe
dataf = pd.DataFrame({
"text": ["positive sentiment", "super negative"],
"label_col": ["pos", "neg"]
})
# This pipeline grabs the `text` column from a dataframe
# which then get fed into OpenAI's endpoint
text_emb_pipeline = make_pipeline(
ColumnGrabber("text"),
OpenAIEncoder()
)
X = text_emb_pipeline.fit_transform(dataf, dataf['label_col'])
# This pipeline can also be trained to make predictions, using
# the embedded features.
text_clf_pipeline = make_pipeline(
text_emb_pipeline,
LogisticRegression()
)
# Prediction example
text_clf_pipeline.fit(dataf, dataf['label_col']).predict(dataf)
Source code in embetter/external/_openai.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
|
AzureOpenAIEncoder
Bases: OpenAIEncoder
Encoder that can numerically encode sentences.
Note that this is an external embedding provider. If their API breaks, so will this component.
To use this encoder you must provide credentials. Please provide one of the api_key
, azure_ad_token
, azure_ad_token_provider
arguments, or the AZURE_OPENAI_API_KEY
or AZURE_OPENAI_AD_TOKEN
.
You must provide one of the base_url
or azure_endpoint
arguments, or the AZURE_OPENAI_ENDPOINT
environment variable.
Furthermore you must provide either the api_version
argument or the OPENAI_API_VERSION
environment variable.
If you have your enviroment variables defined in your .env
file, you can use python-dotenv to load it.
You also need to install the openai
library beforehand.
python -m pip install openai
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
name of model. |
required | |
batch_size |
Batch size to send to AzureOpenAI. |
required |
Usage:
import pandas as pd
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from embetter.grab import ColumnGrabber
from embetter.external import AzureOpenAIEncoder
from dotenv import load_dotenv
load_dotenv() # take environment variables from .env.
# Let's suppose this is the input dataframe
dataf = pd.DataFrame({
"text": ["positive sentiment", "super negative"],
"label_col": ["pos", "neg"]
})
# This pipeline grabs the `text` column from a dataframe
# which then get fed into OpenAI's endpoint
text_emb_pipeline = make_pipeline(
ColumnGrabber("text"),
AzureOpenAIEncoder()
)
X = text_emb_pipeline.fit_transform(dataf, dataf['label_col'])
# This pipeline can also be trained to make predictions, using
# the embedded features.
text_clf_pipeline = make_pipeline(
text_emb_pipeline,
LogisticRegression()
)
# Prediction example
text_clf_pipeline.fit(dataf, dataf['label_col']).predict(dataf)
Source code in embetter/external/_openai.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
|
CohereEncoder
Bases: EmbetterBase
Encoder that can numerically encode sentences.
Note that this is an external embedding provider. If their API breaks, so will this component.
This encoder will require the COHERE_KEY
environment variable to be set.
If you have it defined in your .env
file, you can use python-dotenv to load it.
You also need to install the cohere
library beforehand.
python -m pip install cohere
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
name of model, can be "small" or "large" |
'large'
|
|
batch_size |
Batch size to send to Cohere. |
10
|
Usage:
import pandas as pd
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from embetter.grab import ColumnGrabber
from embetter.external import CohereEncoder
from dotenv import load_dotenv
load_dotenv() # take environment variables from .env.
# Let's suppose this is the input dataframe
dataf = pd.DataFrame({
"text": ["positive sentiment", "super negative"],
"label_col": ["pos", "neg"]
})
# This pipeline grabs the `text` column from a dataframe
# which then get fed into Cohere's endpoint
text_emb_pipeline = make_pipeline(
ColumnGrabber("text"),
CohereEncoder(model="large")
)
X = text_emb_pipeline.fit_transform(dataf, dataf['label_col'])
# This pipeline can also be trained to make predictions, using
# the embedded features.
text_clf_pipeline = make_pipeline(
text_emb_pipeline,
LogisticRegression()
)
# Prediction example
text_clf_pipeline.fit(dataf, dataf['label_col']).predict(dataf)
Source code in embetter/external/_cohere.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
|