whatlies.language.HFTransformersLanguage

This language class can be used to load Hugging Face Transformer models and use them to obtain representation of input string(s) as Embedding or EmbeddingSet.

Important

To use this language class, either of TensorFlow or PyTorch should be installed.

This language model does not contain a vocabulary, so it cannot be used to retreive similar tokens. Use an EmbeddingSet instead.

This language backend might require you to manually install extra dependencies unless you installed via either;

pip install whatlies[transformers]
pip install whatlies[all]

Parameters

Name Type Description Default
model_name_or_path str A string which is the name or identifier of a model from Hugging Face model repository, or is the path to a local directory which contains a pre-trained transformer model files. required
**kwargs Any Additional key-value pair argument(s) which are passed to transformers.pipeline function. {}

Usage:

> from whatlies.language import HFTransformersLanguage
> lang = HFTransformersLanguage('bert-base-cased')
> lang['today is a nice day']
> lang = HFTransformersLanguage('gpt2')
> lang[['day and night', 'it is as clear as day', 'today the sky is clear']]

__getitem__(self, query)

Show source code in language/_hftransformers_lang.py
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
    def __getitem__(self, query: Union[str, List[str]]):
        """
        Retreive a single embedding or a set of embeddings.

        Arguments:
            query: A single string or a list of strings

        Returns:
            An instance of [Embedding][whatlies.embedding.Embedding] (when `query` is a string)
            or [EmbeddingSet][whatlies.embeddingset.EmbeddingSet] (when `query` is a list of strings).
            The embedding vector is computed as the sum of hidden-state representaions of tokens
            (excluding special tokens, e.g. [CLS]).

        **Usage**

        ```python
        > from whatlies.language import HFTransformersLanguage
        > lang = HFTransformersLanguage('bert-base-cased')
        > lang['today is a nice day']
        > lang = HFTransformersLanguage('gpt2')
        > lang[['day and night', 'it is as clear as day', 'today the sky is clear']]
        ```
        """
        if isinstance(query, str):
            return self._get_embedding(query)
        return EmbeddingSet(*[self._get_embedding(q) for q in query])

Retreive a single embedding or a set of embeddings.

Parameters

Name Type Description Default
query Union[str, List[str]] A single string or a list of strings required

Returns

Type Description
`` An instance of Embedding (when query is a string) or EmbeddingSet (when query is a list of strings). The embedding vector is computed as the sum of hidden-state representaions of tokens (excluding special tokens, e.g. [CLS]).

Usage

> from whatlies.language import HFTransformersLanguage
> lang = HFTransformersLanguage('bert-base-cased')
> lang['today is a nice day']
> lang = HFTransformersLanguage('gpt2')
> lang[['day and night', 'it is as clear as day', 'today the sky is clear']]