Interactive Visualisation
Sets of Embeddings¶
The Embedding
object merely has support for matplotlib, but the
EmbeddingSet
has support for interactive tools. It is also more
convenient. You can create an
Direct Creation¶
You can create these objects directly.
import spacy
from whatlies.embedding import Embedding
from whatlies.embeddingset import EmbeddingSet
nlp = spacy.load("en_core_web_md")
words = ["prince", "princess", "nurse", "doctor", "banker", "man", "woman",
"cousin", "neice", "king", "queen", "dude", "guy", "gal", "fire",
"dog", "cat", "mouse", "red", "blue", "green", "yellow", "water",
"person", "family", "brother", "sister"]
emb = EmbeddingSet({t.text: Embedding(t.text, t.vector) for t in nlp.pipe(words)})
This can be especially useful if you're creating your own embeddings.
Via Languages¶
But odds are that you just want to grab a language model from elsewhere. We've added backends to our library and this can be a convenient method of getting sets of embeddings (typically more performant too).
from whatlies.language import SpacyLanguage
words = ["prince", "princess", "nurse", "doctor", "banker", "man", "woman",
"cousin", "neice", "king", "queen", "dude", "guy", "gal", "fire",
"dog", "cat", "mouse", "red", "blue", "green", "yellow", "water",
"person", "family", "brother", "sister"]
lang = SpacyLanguage("en_core_web_md")
emb = lang[words]
Plotting¶
Either way, with an EmbeddingSet
you can create meaningful interactive charts.
emb.plot_interactive('man', 'woman')
We can also retreive embeddings from the embeddingset.
emb['king']
Remember the operations we did before? We can also do that on these sets!
new_emb = emb | (emb['king'] - emb['queen'])
new_emb.plot_interactive('man', 'woman')
Combining Charts¶
Often you'd like to compare the effect of a mapping. Since we make our interactive charts with altair we get a nice api to stack charts next to eachother.
orig_chart = emb.plot_interactive('man', 'woman')
new_chart = new_emb.plot_interactive('man', 'woman')
orig_chart | new_chart
You may have noticed that these charts appear in the documentation, fully interactively. This is another nice feature of Altair, the charts can be serialized in a json format and hosted on the web.
More Transformation¶
But there are more transformations that we might visualise. Let's demonstrate two here.
from whatlies.transformers import Pca, Umap
orig_chart = emb.plot_interactive('man', 'woman')
pca_emb = emb.transform(Pca(2))
umap_emb = emb.transform(Umap(2))
The transform method is able to take a transformation, let's say pca(2)
and this will change
the embeddings in the set. It might also create new embeddings. In case of pca(2)
it will
also add two embeddings which represent the principal components. This is nice because
that means that we can plot along those axes.
plot_pca = pca_emb.plot_interactive()
plot_umap = umap_emb.plot_interactive()
plot_pca | plot_umap
Adding Color to the Charts¶
Sometimes it might be helpful to add color to the charts. In these situations we first need to add a property to the embeddings in the embeddingset. This property can then be picked up by a chart in order to make a subset stand out from the rest of the group.
from whatlies.language import SpacyLanguage
from whatlies.transformers import Pca
words = ["prince", "princess", "nurse", "doctor", "banker", "man", "woman",
"cousin", "neice", "king", "queen", "dude", "guy", "gal", "fire",
"dog", "cat", "mouse", "red", "blue", "green", "yellow", "water",
"person", "family", "brother", "sister"]
colors = ["red", "blue", "green", "yellow"]
lang = SpacyLanguage("en_core_web_md")
# Notice the `assign` method, this is where we assign the `is_color` property
# to each embedding in the embeddingset based on the "name".
embset = (lang[words]
.transform(Pca(2))
.assign(is_color=lambda e: e.name in colors))
embset.plot_interactive(color="is_color")
Using an Interactive Brush¶
We can also choose to use plot_hover
instead of plot_interactive
. The hover chart cannot
zoom in/out but it does allow you to draw a box to make a subselection. This can be very useful
when you're trying to get an overview of a cluster of embeddings.
from whatlies.language import SpacyLanguage
from whatlies.transformers import Pca
words = ["prince", "princess", "nurse", "doctor", "banker", "man", "woman",
"cousin", "neice", "king", "queen", "dude", "guy", "gal", "fire",
"dog", "cat", "mouse", "red", "blue", "green", "yellow", "water",
"person", "family", "brother", "sister"]
colors = ["red", "blue", "green", "yellow"]
lang = SpacyLanguage("en_core_web_md")
embset = (lang[words]
.transform(Pca(2))
.assign(is_color=lambda e: e.name in colors))
embset.plot_brush(n_show=15, color="is_color")
Large Matrix Visualisations¶
If you're up for it, you can draw large matrices of charts too.
from whatlies.language import SpacyLanguage
from whatlies.transformers import Pca
words = ["prince", "princess", "nurse", "doctor", "banker", "man", "woman",
"cousin", "neice", "king", "queen", "dude", "guy", "gal", "fire",
"dog", "cat", "mouse", "red", "blue", "green", "yellow", "water",
"person", "family", "brother", "sister"]
lang = SpacyLanguage("en_core_web_md")
lang[words].transform(Pca(2)).plot_interactive_matrix(0, 1, 2)
Zoom in on that chart. Don't forget to click and drag. Can we interpret the components?