whatlies.language.Sense2VecLanguage

This object is used to lazily fetch Embeddings or EmbeddingSets from a sense2vec language backend. This object is meant for retreival, not plotting.

Parameters

Name Type Description Default
sense2vec_path path to downloaded vectors required

Usage:

> lang = Sense2VecLanguage(sense2vec_path="/path/to/reddit_vectors-1.1.0")
> lang['bank|NOUN']
> lang['bank|VERB']

Important

The reddit vectors are not given by this library. You can find the download link here.

Warning

This tool is temporarily not supported because sense2vec isn't supported by spaCy v3 just yet.

__getitem__(self, query)

Show source code in language/_sense2vec_lang.py
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
    def __getitem__(self, query):
        """
        Retreive a single embedding or a set of embeddings.

        Arguments:
            query: single string or list of strings

        **Usage**
        ```python
        > lang = SpacyLanguage("en_core_web_md")
        > lang['duck|NOUN']
        > lang[['duck|NOUN'], ['duck|VERB']]
        ```
        """
        if isinstance(query, str):
            vec = self.s2v[query]
            return Embedding(query, vec)
        return EmbeddingSet(*[self[tok] for tok in query])

Retreive a single embedding or a set of embeddings.

Parameters

Name Type Description Default
query single string or list of strings required

Usage

> lang = SpacyLanguage("en_core_web_md")
> lang['duck|NOUN']
> lang[['duck|NOUN'], ['duck|VERB']]

embset_similar(self, query, n=10)

Show source code in language/_sense2vec_lang.py
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
    def embset_similar(self, query, n=10):
        """
        Retreive an [EmbeddingSet][whatlies.embeddingset.EmbeddingSet] that are the most simmilar to the passed query.

        Arguments:
            query: query to use
            n: the number of items you'd like to see returned

        Returns:
            An [EmbeddingSet][whatlies.embeddingset.EmbeddingSet] containing the similar embeddings.
        """
        return EmbeddingSet(
            *[self[tok] for tok, sim in self.s2v.most_similar(query, n=n)],
            name=f"Embset[s2v similar_{n}:{query}]",
        )

Retreive an EmbeddingSet that are the most simmilar to the passed query.

Parameters

Name Type Description Default
query query to use required
n the number of items you'd like to see returned 10

Returns

Type Description
`` An EmbeddingSet containing the similar embeddings.

score_similar(self, query, n=10)

Show source code in language/_sense2vec_lang.py
73
74
75
76
77
78
79
80
81
82
83
84
    def score_similar(self, query, n=10):
        """
        Retreive an EmbeddingSet that are the most simmilar to the passed query.

        Arguments:
            query: query to use
            n: the number of items you'd like to see returned

        Returns:
            An list of ([Embedding][whatlies.embedding.Embedding], score) tuples.
        """
        return [(self[tok], sim) for tok, sim in self.s2v.most_similar(query, n=n)]

Retreive an EmbeddingSet that are the most simmilar to the passed query.

Parameters

Name Type Description Default
query query to use required
n the number of items you'd like to see returned 10

Returns

Type Description
`` An list of (Embedding, score) tuples.