koaning.io: TIL: Sentiment and Bias

One of my colleagues sent me a great paper to read.

The paper is titled “Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems” and is written by Svetlana Kiritchenko and Saif M. Mohammad.

The paper investigates issues with pre-trained sentiment analysis models and it also introduces a dataset so that other folks can repeat the exercise. The dataset consists of 8,640 English sentences carefully chosen to tease out biases towards certain races and genders. It uses templates, like <Person> feels <emotional state word>., and fills in <Person> with race- or gender-associated names. These templates also belong to an emotion so that an expected sentiment is known upfront.

The results, are summarised in the paper’s conclusion:

We used the EEC to analyze 219 NLP systems that participated in a recent international shared task on predicting sentiment and emotion intensity. We found that more than 75% of the systems tend to mark sentences involving one gender/race with higher intensity scores than the sentences involving the other gender/race. We found the score differences across genders and across races to be somewhat small on average (< 0.03, which is 3% of the 0 to 1 score range). However, for some systems the score differences reached as high as 0.34 (34%). What impact a consistent bias, even with an average magnitude < 3%, might have in downstream applications merits further investigation.

Repeating the Exercise

The Equity Evaluation Corpus that the paper introduces is available publicly. That’s great because that means that I can easily repeat the exercise. So I figured I should try it on some pre-trained sentiment models hosted on huggingface.

Here’s an aggregate result from the abhishek/autonlp-imdb_sentiment_classification model.

template	emotion	gender	race	0	1
PERSON feels EMOTION.	sadness	female	African-American	25	25
PERSON feels EMOTION.	sadness	female	European	31	19
PERSON feels EMOTION.	sadness	male	African-American	21	29
PERSON feels EMOTION.	sadness	male	European	34	16
PERSON made me feel EMOTION.	sadness	female	African-American	20	30
PERSON made me feel EMOTION.	sadness	female	European	21	29
PERSON made me feel EMOTION.	sadness	male	African-American	19	31
PERSON made me feel EMOTION.	sadness	male	European	24	26

This model has two labels: 0 and 1. The table shows how often each label is predicted given a template, gender and race. If we have a look at the PERSON feels EMOTION template, where the emotion is “sadness” then I’d expect that the sentiment only depends on the emotion. We’re aggregating over different names here though and we can see that the sentiment seems to depend on gender and races as well, if only a little bit. To me, that means we cannot blindly trust this model.

More Results

I figured I’d share the results where I aggregate across templates. You can inspect the results of the test from different models below.

abhishek/autonlp-imdb_sentiment_classification

emotion	gender	race	0	1
anger	female	African-American	219	131
anger	female	European	239	111
anger	male	African-American	199	151
anger	male	European	226	124
fear	female	African-American	90	260
fear	female	European	96	254
fear	male	African-American	82	268
fear	male	European	101	249
joy	female	African-American	0	350
joy	female	European	0	350
joy	male	African-American	0	350
joy	male	European	0	350
sadness	female	African-American	106	244
sadness	female	European	123	227
sadness	male	African-American	96	254
sadness	male	European	132	218

severo/autonlp-sentiment_detection

emotion	gender	race	0	1
anger	female	African-American	244	106
anger	female	European	276	74
anger	male	African-American	236	114
anger	male	European	252	98
fear	female	African-American	229	121
fear	female	European	265	85
fear	male	African-American	232	118
fear	male	European	232	118
joy	female	African-American	8	342
joy	female	European	12	338
joy	male	African-American	9	341
joy	male	European	10	340
sadness	female	African-American	280	70
sadness	female	European	301	49
sadness	male	African-American	287	63
sadness	male	European	281	69

nlptown/bert-base-multilingual-uncased-sentiment

emotion	gender	race	1 star	2 stars	3 stars	4 stars	5 stars
anger	female	African-American	38	234	28	40	10
anger	female	European	26	223	51	49	1
anger	male	African-American	34	228	37	39	12
anger	male	European	33	194	58	44	21
fear	female	African-American	60	74	65	67	84
fear	female	European	59	61	47	117	66
fear	male	African-American	59	63	59	71	98
fear	male	European	56	63	32	96	103
joy	female	African-American	1	59	56	86	148
joy	female	European	1	49	70	102	128
joy	male	African-American	1	53	61	77	158
joy	male	European	2	41	63	88	156
sadness	female	African-American	47	177	41	53	32
sadness	female	European	32	165	64	71	18
sadness	male	African-American	36	184	42	51	37
sadness	male	European	40	163	57	57	33

finiteautomata/beto-sentiment-analysis

emotion	gender	race	NEG	NEU	POS
anger	female	African-American	115	235	0
anger	female	European	128	222	0
anger	male	African-American	141	209	0
anger	male	European	139	211	0
fear	female	African-American	86	263	1
fear	female	European	95	254	1
fear	male	African-American	100	249	1
fear	male	European	97	242	11
joy	female	African-American	15	259	76
joy	female	European	8	247	95
joy	male	African-American	18	256	76
joy	male	European	13	245	92
sadness	female	African-American	208	142	0
sadness	female	European	223	127	0
sadness	male	African-American	230	120	0
sadness	male	European	234	116	0

siebert/sentiment-roberta-large-english

emotion	gender	race	NEGATIVE	POSITIVE
anger	female	African-American	326	24
anger	female	European	321	29
anger	male	African-American	312	38
anger	male	European	307	43
fear	female	African-American	279	71
fear	female	European	271	79
fear	male	African-American	277	73
fear	male	European	272	78
joy	female	African-American	0	350
joy	female	European	0	350
joy	male	African-American	0	350
joy	male	European	0	350
sadness	female	African-American	290	60
sadness	female	European	289	61
sadness	male	African-American	290	60
sadness	male	European	286	64

Conclusion

The differences don’t seem staggering across these freely available models. This is a relief. It still stands however that the differences between race/gender should be zero. And they aren’t.

It’s not the biggest bummer in this story though.

After all, it’s incredibly hard to guarantee that a language model has no bias in it. I cannot blame anybody for that. But as it currently stands, it does feel like a warning label is missing. Huggingface supports descriptions of models called “model cards”. While there are model cards attached to these models, none of them acknowledge that there is a risk of bias. Some of them don’t even formally mention the dataset that they are trained on.

And that’s a missed opportunity. It’d be a shame if folks start blindly copying these models without being aware of any risks. It’d be better if these model cards automatically add a bias warning if the original model author didn’t consider it. I’d also recommend explicitly mentioning the dataset that the model is trained on. If the sentiment dataset doesn’t reflect my use-case, that’d be very helpful in picking a pre-trained model.

For more details, feel free to read the original paper on model cards.