whatlies.language.HFTransformersLanguage
¶
This language class can be used to load Hugging Face Transformer models and use them to obtain representation of input string(s) as Embedding or EmbeddingSet.
Important
To use this language class, either of TensorFlow or PyTorch should be installed.
This language model does not contain a vocabulary, so it cannot be used
to retreive similar tokens. Use an EmbeddingSet
instead.
This language backend might require you to manually install extra dependencies unless you installed via either;
pip install whatlies[transformers]
pip install whatlies[all]
Parameters
Name | Type | Description | Default |
---|---|---|---|
model_name_or_path |
str |
A string which is the name or identifier of a model from Hugging Face model repository, or is the path to a local directory which contains a pre-trained transformer model files. | required |
**kwargs |
Any |
Additional key-value pair argument(s) which are passed to transformers.pipeline function. |
{} |
Usage:
> from whatlies.language import HFTransformersLanguage
> lang = HFTransformersLanguage('bert-base-cased')
> lang['today is a nice day']
> lang = HFTransformersLanguage('gpt2')
> lang[['day and night', 'it is as clear as day', 'today the sky is clear']]
__getitem__(self, query)
¶
Show source code in language/_hftransformers_lang.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|
Retreive a single embedding or a set of embeddings.
Parameters
Name | Type | Description | Default |
---|---|---|---|
query |
Union[str, List[str]] |
A single string or a list of strings | required |
Returns
Type | Description |
---|---|
`` | An instance of Embedding (when query is a string) or EmbeddingSet (when query is a list of strings). The embedding vector is computed as the sum of hidden-state representaions of tokens (excluding special tokens, e.g. [CLS]). |
Usage
> from whatlies.language import HFTransformersLanguage
> lang = HFTransformersLanguage('bert-base-cased')
> lang['today is a nice day']
> lang = HFTransformersLanguage('gpt2')
> lang[['day and night', 'it is as clear as day', 'today the sky is clear']]