whatlies.language.LaBSELanguage

Retreive a Language Agnostic Bert model from huggingface.

The model is suggested to support 109 languages. You can see the language list in the apendix of the original paper found here.

Important

This object will automatically download a large file if it is not cached yet.

This language model does not contain a vocabulary, so it cannot be used to retreive similar tokens. Use an EmbeddingSet instead.

This language backend might require you to manually install extra dependencies unless you installed via either;

pip install whatlies[transformers]
pip install whatlies[all]

Usage:

from whatlies.language import LaBSELanguage
lang = LaBSELanguage()

texts = ['ik vind honden leuk', 'i really like dogs', 'me gusta los perros!',
         'let us talk about money', 'laten we over geld praten', 'hablemos de dinero',
         'los stroopwafels son impresionantes', 'stroopwafels zijn heerlijk',
         'give me more stroopwafels']

lang[texts].plot_similarity()