Devise a method to detect hate and harassment speech in social media

Project description: 
  • Data basis: Subreddits (thematic subforums of reddit.com) (data of posts and comments in subreddits that have been banned for harassment or hate speech and some non-hate subreddits), alternatively other data sources are possible as well (e.g., Twitter, Wikipedia)
  • Goal: Find patterns that match well with (consistent) human judgement or scholarly definitions of “hate speech” or “harassment”
  • Method: 
    • Look at existing definitions and operationalizations of “hate speech” and “harassment” in the literature
    • Categorize types of hate and harassment speech that you find in the data by hand
    • Explore existing dictionaries for hate speech (e.g. hatebase.org) or sentiment dictionaries and/or think about own speech patterns (e.g. regular expressions, answer patterns to comments) and see how they fit the data, try to implement automated methods
  • Team: should consist of at least 50% members with at least medium programming skills
Advisor(s):