Model
DifferenceClassifier
Classifier for similarity using encoders under the hood.
It's similar to the scikit-learn models that you're used to, but it accepts
two inputs X1
and X2
and tries to predict if they are similar. Effectively
it's just a classifier on top of diff(X1 - X2)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
enc |
TransformerMixin
|
scikit-learn compatbile encoder of the input data |
required |
clf_head |
ClassifierMixin
|
the classifier to apply at the end |
None
|
Usage:
from embetter.model import DifferenceClassifier
from embetter.text import SentenceEncoder
mod = DifferenceClassifier(enc=SentenceEncoder())
# Suppose this is input data
texts1 = ["hello", "firehydrant", "greetings"]
texts2 = ["no", "yes", "greeting"]
# You will need to have some definition of "similar"
similar = [0, 0, 1]
# Train a model to detect similarity
mod.fit(X1=texts1, X2=texts2, y=similar)
mod.predict(X1=texts1, X2=texts2)
# The classifier head is a scikit-learn model, which you could save
# seperately if you like. The model can be accessed via:
mod.clf_head
Source code in embetter/model/_diff.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
|