

`whatlies.language.SpacyLanguage`¶

This object is used to lazily fetch Embeddings or EmbeddingSets from a spaCy language backend. This object is meant for retreival, not plotting.

Parameters

Name	Type	Description	Default
`nlp`	`Union[str, spacy.language.Language]`	name of the model to load, be sure that it's downloaded beforehand	required

Important

This language backend might require you to manually install extra dependencies unless you installed via either;

pip install whatlies[spacy]
pip install whatlies[all]

Usage:

> lang = SpacyLanguage("en_core_web_md")
> lang['python']
> lang[['python', 'snake', 'dog']]

`getitem(self, query)`¶

Show source code in language/_spacy_lang.py

    def __getitem__(
        self, query: Union[str, List[str]]
    ) -> Union[Embedding, EmbeddingSet]:
        """
        Retreive a single embedding or a set of embeddings. Depending on the spaCy model
        the strings can support multiple tokens of text but they can also use the Bert DSL.
        See the Language Options documentation: https://koaning.github.io/whatlies/tutorial/languages/#bert-style.

        Arguments:
            query: single string or list of strings

        **Usage**
        ```python
        > lang = SpacyLanguage("en_core_web_md")
        > lang['python']
        > lang[['python', 'snake']]
        > lang[['nobody expects', 'the spanish inquisition']]
        > lang = SpacyLanguage("en_trf_robertabase_lg")
        > lang['programming in [python]']
        ```
        """
        if isinstance(query, str):
            return self._get_embedding(query)
        return EmbeddingSet(*[self._get_embedding(q) for q in query])

Retreive a single embedding or a set of embeddings. Depending on the spaCy model the strings can support multiple tokens of text but they can also use the Bert DSL. See the Language Options documentation: https://koaning.github.io/whatlies/tutorial/languages/#bert-style.

Parameters

Name	Type	Description	Default
`query`	`Union[str, List[str]]`	single string or list of strings	required

Usage

> lang = SpacyLanguage("en_core_web_md")
> lang['python']
> lang[['python', 'snake']]
> lang[['nobody expects', 'the spanish inquisition']]
> lang = SpacyLanguage("en_trf_robertabase_lg")
> lang['programming in [python]']

`embset_proximity(self, emb, max_proximity=0.1, prob_limit=-15, lower=True, metric='cosine')`¶

Show source code in language/_spacy_lang.py

    def embset_proximity(
        self,
        emb: Union[str, Embedding],
        max_proximity: float = 0.1,
        prob_limit=-15,
        lower=True,
        metric="cosine",
    ):
        """
        Retreive an [EmbeddingSet][whatlies.embeddingset.EmbeddingSet] or embeddings that are within a proximity.

        Arguments:
            emb: query to use
            max_proximity: the number of items you'd like to see returned
            prob_limit: likelihood limit that sets the subset of words to search
            metric: metric to use to calculate distance, must be scipy or sklearn compatible
            lower: only fetch lower case tokens

        Returns:
            An [EmbeddingSet][whatlies.embeddingset.EmbeddingSet] containing the similar embeddings.
        """
        if isinstance(emb, str):
            emb = self[emb]

        queries = self._prepare_queries(prob_limit, lower)
        distances = self._calculate_distances(emb, queries, metric)
        return EmbeddingSet(
            {w: self[w] for w, d in zip(queries, distances) if d <= max_proximity}
        )

Retreive an EmbeddingSet or embeddings that are within a proximity.

Parameters

Name	Type	Description	Default
`emb`	`Union[str, whatlies.embedding.Embedding]`	query to use	required
`max_proximity`	`float`	the number of items you'd like to see returned	`0.1`
`prob_limit`		likelihood limit that sets the subset of words to search	`-15`
`metric`		metric to use to calculate distance, must be scipy or sklearn compatible	`'cosine'`
`lower`		only fetch lower case tokens	`True`

Returns

Type	Description
``	An EmbeddingSet containing the similar embeddings.

`embset_similar(self, emb, n=10, prob_limit=-15, lower=True, metric='cosine')`¶

Show source code in language/_spacy_lang.py

    def embset_similar(
        self,
        emb: Union[str, Embedding],
        n: int = 10,
        prob_limit=-15,
        lower=True,
        metric="cosine",
    ):
        """
        Retreive an [EmbeddingSet][whatlies.embeddingset.EmbeddingSet] that are the most simmilar to the passed query.

        Arguments:
            emb: query to use
            n: the number of items you'd like to see returned
            prob_limit: likelihood limit that sets the subset of words to search
            metric: metric to use to calculate distance, must be scipy or sklearn compatible
            lower: only fetch lower case tokens

        Returns:
            An [EmbeddingSet][whatlies.embeddingset.EmbeddingSet] containing the similar embeddings.
        """
        embs = [w[0] for w in self.score_similar(emb, n, prob_limit, lower, metric)]
        return EmbeddingSet({w.name: w for w in embs})

Retreive an EmbeddingSet that are the most simmilar to the passed query.

Parameters

Name	Type	Description	Default
`emb`	`Union[str, whatlies.embedding.Embedding]`	query to use	required
`n`	`int`	the number of items you'd like to see returned	`10`
`prob_limit`		likelihood limit that sets the subset of words to search	`-15`
`metric`		metric to use to calculate distance, must be scipy or sklearn compatible	`'cosine'`
`lower`		only fetch lower case tokens	`True`

Returns

Type	Description
``	An EmbeddingSet containing the similar embeddings.

`score_similar(self, emb, n=10, prob_limit=-15, lower=True, metric='cosine')`¶

Show source code in language/_spacy_lang.py

    def score_similar(
        self,
        emb: Union[str, Embedding],
        n: int = 10,
        prob_limit=-15,
        lower=True,
        metric="cosine",
    ):
        """
        Retreive a list of (Embedding, score) tuples that are the most simmilar to the passed query.

        Arguments:
            emb: query to use
            n: the number of items you'd like to see returned
            prob_limit: likelihood limit that sets the subset of words to search, to ignore set to `None`
            metric: metric to use to calculate distance, must be scipy or sklearn compatible
            lower: only fetch lower case tokens

        Returns:
            An list of ([Embedding][whatlies.embedding.Embedding], score) tuples.
        """
        if isinstance(emb, str):
            emb = self[emb]

        queries = self._prepare_queries(prob_limit, lower)
        distances = self._calculate_distances(emb, queries, metric)
        by_similarity = sorted(zip(queries, distances), key=lambda z: z[1])

        if len(queries) < n:
            warnings.warn(
                f"We could only find {len(queries)} feasible words. Consider changing `prob_limit` or `lower`",
                UserWarning,
            )

        return [(self[q.text], float(d)) for q, d in by_similarity[:n]]

Retreive a list of (Embedding, score) tuples that are the most simmilar to the passed query.

Parameters

Name	Type	Description	Default
`emb`	`Union[str, whatlies.embedding.Embedding]`	query to use	required
`n`	`int`	the number of items you'd like to see returned	`10`
`prob_limit`		likelihood limit that sets the subset of words to search, to ignore set to `None`	`-15`
`metric`		metric to use to calculate distance, must be scipy or sklearn compatible	`'cosine'`
`lower`		only fetch lower case tokens	`True`

Returns

Type	Description
``	An list of (Embedding, score) tuples.

whatlies.language.SpacyLanguage¶

__getitem__(self, query)¶

embset_proximity(self, emb, max_proximity=0.1, prob_limit=-15, lower=True, metric='cosine')¶

embset_similar(self, emb, n=10, prob_limit=-15, lower=True, metric='cosine')¶

score_similar(self, emb, n=10, prob_limit=-15, lower=True, metric='cosine')¶

`whatlies.language.SpacyLanguage`¶

`getitem(self, query)`¶

`embset_proximity(self, emb, max_proximity=0.1, prob_limit=-15, lower=True, metric='cosine')`¶

`embset_similar(self, emb, n=10, prob_limit=-15, lower=True, metric='cosine')`¶

`score_similar(self, emb, n=10, prob_limit=-15, lower=True, metric='cosine')`¶