TIL: DALC

A Dutch Abbusive Language Corpus

Vincent Warmerdam koaning.io
2022-09-09

A few Dutch students took the effort of making an Abusive Language Corpus for the Dutch language. They described their effort in a paper and also released the dataset on GitHub.

The repository also contains the GROF lexicon, which comes with a lemma list that can be compared against.

Lists like these aren’t perfect, but they can be a great starting point to detect abusive speech online. Many (Dutch) platforms can really benefit from that.