koaning.io: TIL: VADER - rule based sentiment

Sentiment detection is an unsolved problem, largely because language is very much a cultural thing. I can’t say that I have a lot of trust in pre-trained sentiment models but … I recently learned about a sensible benchmark in sentiment detection that seems valid to keep in the back of your mind if you ever dabble in this space. It’s certainly not at all perfect, but it is rule-based which makes it somewhat predictable.

The algorithm is called VADER which stands for “Valence Aware Dictionary and sEntiment Reasoner”. You can find a copy of the paper here and there’s also a GitHub repository available. It’s also been incorporated into nltk.

At the core, it’s little more than a word-matching algorithm that relies on a lexicon.txt file. Here’s some examples of what this file contains contains;

adorability 2.2 0.74833 [2, 2, 2, 2, 1, 2, 3, 2, 4, 2]
adorable    2.2 0.6 [3, 2, 2, 3, 2, 2, 1, 3, 2, 2]
adorableness    2.5 0.67082 [2, 3, 3, 2, 3, 2, 1, 3, 3, 3]
love    3.2 0.4 [3, 3, 3, 3, 3, 3, 3, 4, 4, 3]
loved   2.9 0.7 [3, 3, 4, 2, 2, 4, 3, 2, 3, 3]
harsh   -1.9    0.7 [-1, -1, -2, -2, -1, -3, -3, -2, -2, -2]
harsher -2.2    0.6 [-2, -3, -2, -3, -2, -2, -1, -3, -2, -2]
harshest    -2.9    0.83066 [-4, -2, -2, -2, -2, -3, -3, -4, -4, -3]
hate    -2.7    1.00499 [-4, -3, -4, -4, -2, -2, -2, -2, -1, -3]
hated   -3.2    0.6 [-3, -3, -4, -3, -2, -3, -3, -4, -4, -3]
hateful -2.2    1.249   [-3, 1, -3, -3, -1, -2, -2, -3, -3, -3]

It’s just a list of words with score and features attached. These scores were calculated by having volunteers label English tweets and by adding syntactical heuristics that boost sentences that use explamations or capital letters. The source code also lists dictionaries that show some extra logic in the heuristics.

B_INCR = 0.293
B_DECR = -0.293

NEGATE = ["aint", "arent", "cannot", "cant", "couldnt", ... ]

# booster/dampener 'intensifiers' or 'degree adverbs'
# http://en.wiktionary.org/wiki/Category:English_degree_adverbs
BOOSTER_DICT = {"absolutely": B_INCR, "amazingly": B_INCR, "awfully": B_INCR, ...}

# check for special case idioms and phrases containing lexicon words
SPECIAL_CASES = {"the shit": 3, "the bomb": 3, "bad ass": 1.5, ... }

It’s by no means a state of the art idea. You might even argue it’s relatively hacky. But the paper suggests that on many tasks this approach works better than training a standard scikit-learn model on the task. That’s pretty interesting.

Figure 1: Results comparing the heuristic to scikit-learn models.

Use-Cases

Sentiment models are tricky. They can’t be trusted to understand cultural difference in language so I prefer deploying them when there’s a human in the loop. That said, I like to think there’s still some valid use-cases for this model.

If nothing else it could be seen as a benchmark that any other model needs to outperform. If a deep-learning model cannot beat this benchmark, you may not want it.
Since this model is based on a lexicon it feels like it may be a “high-bias” model. That suggests that it may also be useful as a model to find bad labels. I’d love to give it a spin on some sentiment datasets using doubtlab.
I can imagine that such a model can help me label. I imagine this model is lightweight to run so it should be able to parse large text files for examples with a low sentiment. This may be a useful method to discover failing chatbot dialogues, which is of interest to my employer, Rasa.
It may be used as a featurizer/labelling function in a machine learning pipeline. Even if it’s not a perfect signal, it can be a signal to get started with when you’re exploring a dataset. Even if the end-goal isn’t to detect sentiment, it can be a useful feature to have around.