koaning.io: TIL: Active, but Visual, Learning

I read another interesting paper the other day.

The paper explores visual aides to help users label as an alternative to active learning techniques, which is something that I’ve been exploring in my bulk labelling work. They have some interesting ideas too.

When you pass a dataset through dimensionality reduction, like TSNE, you end up with a scatter chart. Hopefully there’ll be a few clusters, which might help you label. This paper investigates the use of visual aides on top of these scatter charts and there’s some interesting conclusions.

It seems that especially when you’re just starting out, there’s genuine merit to the visual technique. This is partially because active learning techniques tend to suffer from a cold start, but also because the visual tools allow you to label more than one point at a time.

What was especially interesting to me is that drawing a convex hull over the labelled points seems to help out. The convex hulls allow the interface to visualize the boundaries of the classes.

It’s interesting stuff that certainly makes me curious but there’s a caveat that the paper correctly mentions in it’s abstract.

Our main findings are that visual-interactive labeling can outperform active learning, given the condition that dimension reduction separates well the class distributions. Moreover, using dimension reduction in combination with additional visual encodings that expose the internal state of the learning model turns out to improve the performance of visual-interactive labeling.

The bit about “given the condition that dimension reduction separates well the class distributions” is key.