
Förderjahr 2024 / Projekt Call #19 / ProjektID: 7207 / Projekt: HaSPI
Hate speech on the internet has been a growing problem as platforms expand. While human moderation has traditionally been the solution, it’s too slow and labor-intensive (not to mention taxing on the people who perform it!) to keep up with the increasing volume of content.
This is where automatic moderation tools become very valuable - they can pre-screen lots of posts and drastically reduce the workload of content moderators. In theory. The reality unfortunately is a bit more complicated: The bachelor's thesis of our team member Felix Krejca showed that, while there is a lot of research on automatic moderation, few studies focus specifically on German-language content. Most of the existing systems also overlook critical context-specific information, which has been shown to improve accuracy.
Our Approach
This is where our project begins. Specifically, we want to use imitation learning to leverage the German content moderation dataset One Million Posts Corpus by DER STANDARD to create an automatic content moderation system. The idea of this technique can be explained like this: Imagine a robot who is in an environment where lots of hate speech is thrown around. Naturally, it learns to adopt talking in only hate speech, learning the intricacies of what is hurtful in which situations and what isn't. Eventually it begins to see the error in it's ways and applies to be a content moderator. The sophisticated knowledge of what passes as a harmless post and what would be considered hate speech now is incredibly useful to the humans it assists and, crucially, it also never has to actually speak another sentence of hate speech ever again. (Similarly, we never have to use the imitating of hate speech to detect it - as we do not want to release a hate speech spewing bot into the wild! We will go into more detail in a future blog post.)
Another benefit of having this bot is that it can also tell us which components of the text it found most important for making the decision. The system therefore can also highlight these parts of the post, making it easier for human moderators to understand and act on.
Next Steps
This approach has also been validated by Google DeepMind in a recent NeurIPS paper where they used imitation learning on general language tasks, which also implies that we can use the method to better extract intentions behind posts.
Our first steps will be getting a systematic overview of the current state-of-the-art (specifically the literature that has come out since our proposal), setting up the infrastructure and implementing the methods used for training. We’re excited to dive into this project and can’t wait to share our progress with you. Follow along for updates and feel free to check out our GitHub for full access to the code!