Identifying personal attacks in real time on the English Wikipedia using machine learning and sentiment analysis

Introduction

The Wikipedia community has had a persistent issue with personal attacks and harassment. An automated tool which checks each new user talk edit against an intent model and flags especially high scoring diffs for post-save review will help administrators or other experienced editors act soon after a message is posted rather than waiting for it to be reported.

Tool proof of concept

To aid in the realisation of such a tool, I have created a proof of concept which runs the latest user talk edit through a number of machine learning models and sentiment analysis APIs.

Research

To help quantify the need for a tool like this, and support future research, I have developed a web application which scans ten recent user talk changes once an hour, and scores them using the Detox API. This is currently running, and you can view the overview, or some statistics.