Seems like a sensible baseline.
Sentiment detection is an unsolved problem, largely because language is very much a cultural thing. I can’t say that I have a lot of trust in pre-trained sentiment models but … I recently learned about a sensible benchmark in sentiment detection that seems valid to keep in the back of your mind if you ever dabble in this space. It’s certainly not at all perfect, but it is rule-based which makes it somewhat predictable.
The algorithm is called VADER which stands for “Valence Aware Dictionary and sEntiment Reasoner”. You can find a copy of the paper here and there’s also a GitHub repository available. It’s also been incorporated into nltk.
At the core, it’s little more than a word-matching algorithm that relies on a lexicon.txt file. Here’s some examples of what this file contains contains;
adorability 2.2 0.74833 [2, 2, 2, 2, 1, 2, 3, 2, 4, 2]
adorable 2.2 0.6 [3, 2, 2, 3, 2, 2, 1, 3, 2, 2]
adorableness 2.5 0.67082 [2, 3, 3, 2, 3, 2, 1, 3, 3, 3]
love 3.2 0.4 [3, 3, 3, 3, 3, 3, 3, 4, 4, 3]
loved 2.9 0.7 [3, 3, 4, 2, 2, 4, 3, 2, 3, 3]
harsh -1.9 0.7 [-1, -1, -2, -2, -1, -3, -3, -2, -2, -2]
harsher -2.2 0.6 [-2, -3, -2, -3, -2, -2, -1, -3, -2, -2]
harshest -2.9 0.83066 [-4, -2, -2, -2, -2, -3, -3, -4, -4, -3]
hate -2.7 1.00499 [-4, -3, -4, -4, -2, -2, -2, -2, -1, -3]
hated -3.2 0.6 [-3, -3, -4, -3, -2, -3, -3, -4, -4, -3]
hateful -2.2 1.249 [-3, 1, -3, -3, -1, -2, -2, -3, -3, -3]
It’s just a list of words with score and features attached. These scores were calculated by having volunteers label English tweets and by adding syntactical heuristics that boost sentences that use explamations or capital letters. The source code also lists dictionaries that show some extra logic in the heuristics.
= 0.293
B_INCR = -0.293
B_DECR
= ["aint", "arent", "cannot", "cant", "couldnt", ... ]
NEGATE
# booster/dampener 'intensifiers' or 'degree adverbs'
# http://en.wiktionary.org/wiki/Category:English_degree_adverbs
= {"absolutely": B_INCR, "amazingly": B_INCR, "awfully": B_INCR, ...}
BOOSTER_DICT
# check for special case idioms and phrases containing lexicon words
= {"the shit": 3, "the bomb": 3, "bad ass": 1.5, ... } SPECIAL_CASES
It’s by no means a state of the art idea. You might even argue it’s relatively hacky. But the paper suggests that on many tasks this approach works better than training a standard scikit-learn model on the task. That’s pretty interesting.
Sentiment models are tricky. They can’t be trusted to understand cultural difference in language so I prefer deploying them when there’s a human in the loop. That said, I like to think there’s still some valid use-cases for this model.