TIL: DALC

A Dutch Abbusive Language Corpus

Author

Affiliation

Vincent Warmerdam

 

Published

Sept. 8, 2022

DOI

A few Dutch students took the effort of making an Abusive Language Corpus for the Dutch language. They described their effort in a paper and also released the dataset on GitHub.

The repository also contains the GROF lexicon, which comes with a lemma list that can be compared against.

Lists like these aren’t perfect, but they can be a great starting point to detect abusive speech online. Many (Dutch) platforms can really benefit from that.

Footnotes