whatlies.embedding.Embedding

This object represents a word embedding. It contains a vector and a name.

Parameters

Name Type Description Default
name the name of this embedding, includes operations required
vector the numerical representation of the embedding required
orig original name of embedding, is left alone None

Usage:

from whatlies.embedding import Embedding

foo = Embedding("foo", [0.1, 0.3])
bar = Embedding("bar", [0.7, 0.2])

foo | bar
foo - bar + bar

ndim: (property, readonly)

Return the dimension of embedding vector.

norm: (property, readonly)

Gives the norm of the vector of the embedding

__add__(self, other)

Show source code in whatlies/embedding.py
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
    def __add__(self, other) -> "Embedding":
        """
        Add two embeddings together.

        Usage:

        ```python
        from whatlies.embedding import Embedding

        foo = Embedding("foo", [0.1, 0.3])
        bar = Embedding("bar", [0.7, 0.2])

        foo + bar
        ```
        """
        copied = deepcopy(self)
        copied.name = f"({self.name} + {other.name})"
        copied.vector = self.vector + other.vector
        return copied

Add two embeddings together.

Usage:

from whatlies.embedding import Embedding

foo = Embedding("foo", [0.1, 0.3])
bar = Embedding("bar", [0.7, 0.2])

foo + bar

__gt__(self, other)

Show source code in whatlies/embedding.py
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
    def __gt__(self, other):
        """
        Measures the size of one embedding to another one.

        The `>` is meant to indicate the "unto" operation.

        Usage:

        ```python
        from whatlies.embedding import Embedding

        foo = Embedding("foo", [0.1, 0.3])
        bar = Embedding("bar", [0.7, 0.2])

        foo > bar
        ```
        """
        return (self.vector.dot(other.vector)) / (other.vector.dot(other.vector))

Measures the size of one embedding to another one.

The > is meant to indicate the "unto" operation.

Usage:

from whatlies.embedding import Embedding

foo = Embedding("foo", [0.1, 0.3])
bar = Embedding("bar", [0.7, 0.2])

foo > bar

__neg__(self)

Show source code in whatlies/embedding.py
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
    def __neg__(self):
        """
        Negate an embedding.

        Usage:

        ```python
        from whatlies.embedding import Embedding

        foo = Embedding("foo", [0.1, 0.3])

        assert (- foo).vector == - foo.vector
        ```
        """
        copied = deepcopy(self)
        copied.name = f"(-{self.name})"
        copied.vector = -self.vector
        return copied

Negate an embedding.

Usage:

from whatlies.embedding import Embedding

foo = Embedding("foo", [0.1, 0.3])

assert (- foo).vector == - foo.vector

__or__(self, other)

Show source code in whatlies/embedding.py
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
    def __or__(self, other):
        """
        Makes one embedding orthogonal to the other one.

        Usage:

        ```python
        from whatlies.embedding import Embedding

        foo = Embedding("foo", [0.1, 0.3])
        bar = Embedding("bar", [0.7, 0.2])

        foo | bar
        ```
        """
        copied = deepcopy(self)
        copied.name = f"({self.name} | {other.name})"
        copied.vector = self.vector - (self >> other).vector
        return copied

Makes one embedding orthogonal to the other one.

Usage:

from whatlies.embedding import Embedding

foo = Embedding("foo", [0.1, 0.3])
bar = Embedding("bar", [0.7, 0.2])

foo | bar

__rshift__(self, other)

Show source code in whatlies/embedding.py
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
    def __rshift__(self, other):
        """
        Maps an embedding unto another one.

        Usage:

        ```python
        from whatlies.embedding import Embedding

        foo = Embedding("foo", [0.1, 0.3])
        bar = Embedding("bar", [0.7, 0.2])

        foo >> bar
        ```
        """
        copied = deepcopy(self)
        new_vec = (
            (self.vector.dot(other.vector))
            / (other.vector.dot(other.vector))
            * other.vector
        )
        copied.name = f"({self.name} >> {other.name})"
        copied.vector = new_vec
        return copied

Maps an embedding unto another one.

Usage:

from whatlies.embedding import Embedding

foo = Embedding("foo", [0.1, 0.3])
bar = Embedding("bar", [0.7, 0.2])

foo >> bar

__sub__(self, other)

Show source code in whatlies/embedding.py
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
    def __sub__(self, other):
        """
        Subtract two embeddings.

        Usage:

        ```python
        from whatlies.embedding import Embedding

        foo = Embedding("foo", [0.1, 0.3])
        bar = Embedding("bar", [0.7, 0.2])

        foo - bar
        ```
        """
        copied = deepcopy(self)
        copied.name = f"({self.name} - {other.name})"
        copied.vector = self.vector - other.vector
        return copied

Subtract two embeddings.

Usage:

from whatlies.embedding import Embedding

foo = Embedding("foo", [0.1, 0.3])
bar = Embedding("bar", [0.7, 0.2])

foo - bar

copy(self)

Show source code in whatlies/embedding.py
50
51
52
53
54
    def copy(self):
        """
        Returns a deepcopy of the embdding.
        """
        return deepcopy(self)

Returns a deepcopy of the embdding.

distance(self, other, metric='cosine')

Show source code in whatlies/embedding.py
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
    def distance(self, other, metric: str = "cosine"):
        """
        Calculates the vector distance between two embeddings.

        Arguments:
            other: the other embedding you're comparing against
            metric: the distance metric to use, the list of valid options can be found [here](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html)

        **Usage**

        ```python
        from whatlies.embedding import Embedding

        foo = Embedding("foo", [1.0, 0.0])
        bar = Embedding("bar", [0.0, 0.5])

        foo.distance(bar)
        foo.distance(bar, metric="euclidean")
        foo.distance(bar, metric="cosine")
        ```
        """
        return pairwise_distances([self.vector], [other.vector], metric=metric)[0][0]

Calculates the vector distance between two embeddings.

Parameters

Name Type Description Default
other the other embedding you're comparing against required
metric str the distance metric to use, the list of valid options can be found here 'cosine'

Usage

from whatlies.embedding import Embedding

foo = Embedding("foo", [1.0, 0.0])
bar = Embedding("bar", [0.0, 0.5])

foo.distance(bar)
foo.distance(bar, metric="euclidean")
foo.distance(bar, metric="cosine")

plot(self, kind='arrow', x_axis=0, y_axis=1, axis_metric=None, x_label=None, y_label=None, title=None, color=None, show_ops=False, annot=True, axis_option=None)

Show source code in whatlies/embedding.py
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
    def plot(
        self,
        kind: str = "arrow",
        x_axis: Union[int, "Embedding"] = 0,
        y_axis: Union[int, "Embedding"] = 1,
        axis_metric: Optional[Union[str, Callable, Sequence]] = None,
        x_label: Optional[str] = None,
        y_label: Optional[str] = None,
        title: Optional[str] = None,
        color: str = None,
        show_ops: bool = False,
        annot: bool = True,
        axis_option: Optional[str] = None,
    ):
        """
        Handles the logic to perform a 2d plot in matplotlib.

        Arguments:
            kind: what kind of plot to make, can be `scatter`, `arrow` or `text`
            x_axis: the x-axis to be used, must be given when dim > 2; if an integer, the corresponding
                dimension of embedding is used.
            y_axis: the y-axis to be used, must be given when dim > 2; if an integer, the corresponding
                dimension of embedding is used.
            axis_metric: the metric used to project an embedding on the axes; only used when the corresponding
                axis (i.e. `x_axis` or `y_axis`) is an `Embedding` instance. It could be a string
                (`'cosine_similarity'`, `'cosine_distance'` or `'euclidean'`), or a callable that takes two vectors as input
                and returns a scalar value as output. To set different metrics for x- and y-axis, a list or a tuple of
                two elements could be given. By default (`None`), normalized scalar projection (i.e. `>` operator) is used.
            x_label: an optional label used for x-axis; if not given, it is set based on `x_axis` value.
            y_label: an optional label used for y-axis; if not given, it is set based on `y_axis` value.
            title: an optional title for the plot.
            color: the color of the dots
            show_ops: setting to also show the applied operations, only works for `text`
            annot: should the points be annotated
            axis_option: a string which is passed as `option` argument to `matplotlib.pyplot.axis` in order to control
                axis properties (e.g. using `'equal'` make circles shown circular in the plot). This might be useful
                for preserving geometric relationships (e.g. orthogonality) in the generated plot. See `matplotlib.pyplot.axis`
                [documentation](https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.pyplot.axis.html#matplotlib-pyplot-axis)
                for possible values and their description.

        **Usage**
        ```python
        from whatlies.embedding import Embedding

        foo = Embedding("foo", [0.1, 0.3])
        bar = Embedding("bar", [0.7, 0.2])

        foo.plot(kind="arrow", annot=True)
        bar.plot(kind="arrow", annot=True)
        ```
        """
        if isinstance(axis_metric, (list, tuple)):
            x_axis_metric = axis_metric[0]
            y_axis_metric = axis_metric[1]
        else:
            x_axis_metric = axis_metric
            y_axis_metric = axis_metric
        x_val, x_lab = self._get_plot_axis_value_and_label(
            x_axis, x_axis_metric, dir="x"
        )
        y_val, y_lab = self._get_plot_axis_value_and_label(
            y_axis, y_axis_metric, dir="y"
        )
        x_label = x_lab if x_label is None else x_label
        y_label = y_lab if y_label is None else y_label
        emb_plot = Embedding(name=self.name, vector=[x_val, y_val], orig=self.orig)
        handle_2d_plot(
            emb_plot,
            kind=kind,
            color=color,
            xlabel=x_label,
            ylabel=y_label,
            title=title,
            show_operations=show_ops,
            annot=annot,
            axis_option=axis_option,
        )
        return self

Handles the logic to perform a 2d plot in matplotlib.

Parameters

Name Type Description Default
kind str what kind of plot to make, can be scatter, arrow or text 'arrow'
x_axis Union[int, ForwardRef('Embedding')] the x-axis to be used, must be given when dim > 2; if an integer, the corresponding dimension of embedding is used. 0
y_axis Union[int, ForwardRef('Embedding')] the y-axis to be used, must be given when dim > 2; if an integer, the corresponding dimension of embedding is used. 1
axis_metric Optional[Union[str, Callable, Sequence]] the metric used to project an embedding on the axes; only used when the corresponding axis (i.e. x_axis or y_axis) is an Embedding instance. It could be a string ('cosine_similarity', 'cosine_distance' or 'euclidean'), or a callable that takes two vectors as input and returns a scalar value as output. To set different metrics for x- and y-axis, a list or a tuple of two elements could be given. By default (None), normalized scalar projection (i.e. > operator) is used. None
x_label Optional[str] an optional label used for x-axis; if not given, it is set based on x_axis value. None
y_label Optional[str] an optional label used for y-axis; if not given, it is set based on y_axis value. None
title Optional[str] an optional title for the plot. None
color str the color of the dots None
show_ops bool setting to also show the applied operations, only works for text False
annot bool should the points be annotated True
axis_option Optional[str] a string which is passed as option argument to matplotlib.pyplot.axis in order to control axis properties (e.g. using 'equal' make circles shown circular in the plot). This might be useful for preserving geometric relationships (e.g. orthogonality) in the generated plot. See matplotlib.pyplot.axis documentation for possible values and their description. None

Usage

from whatlies.embedding import Embedding

foo = Embedding("foo", [0.1, 0.3])
bar = Embedding("bar", [0.7, 0.2])

foo.plot(kind="arrow", annot=True)
bar.plot(kind="arrow", annot=True)