AI Research

Researchers Use Vile Comments from Trump Subreddit to Train AI to Battle Hate Speech

Researchers from Intel AI and University of California at Santa Barbara have introduced a new generative hate speech intervention model, along with two large-scale fully-labeled hate speech datasets collected from Reddit and Gab.

Social media platforms like Facebook and Twitter have imposed rigorous policies in an effort to combat hate speech and extremism. Existing AI-based policing models however tend to simply detect and delete objectionable posts based on keywords.

Now, researchers from Intel AI and University of California at Santa Barbara have introduced a new generative hate speech intervention model, along with two large-scale fully-labeled hate speech datasets collected from Reddit and Gab.

The standout feature of the research is that along with hate speech detection, the datasets can also provide tailored intervention responses written by Amazon Mechanical Turk workers. In this way an AI model can be trained to both detect hate speech and generate appropriate responses for specific types of hate speech.

“Simply detecting and blocking hate speech or suspicious users often has limited ability to prevent these users from simply turning to other social media platforms to continue to engage in hate speech as can be seen in the large move of individuals blocked from Twitter to Gab,” the researchers explain.

The datasets consist of 5,020 conversations retrieved from Reddit pages such as “r/The Donald,” a subreddit for discussion on US President Donald Trump that was “quarantined” earlier this year for incitements to violence. The research team used keywords to identify potentially hateful comments and then reconstructed the conversational context of each comment. The dataset also contains 11,825 conversations retrieved from right-wing discussion platform Gab.

The research team crowd-sourced workers from Amazon Mechanical Turk to label the comments and generate intervention responses on a case-by-case basis. The workers were asked to answer two questions:

  1. Which posts or comments in this conversation are hate speech?
  2. If there exists hate speech in the conversation, how would you respond to intervene? Write down a response that can probably hold it back (word limit: 140 characters).

In their experiments, researchers evaluated four methods on a binary hate speech detection task: Logistic Regression (LR), Support Vector Machines (SVM), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN). They also evaluated three models on generative hate speech intervention tasks: Seq2Seq, Variational Auto-Encoder (VAE), and Reinforcement Learning (RL). The results are below.

Bots have a spotty history when it comes to racist or inflammatory content — several years ago the Microsoft online digital assistant “Tay” was prompted to spew a series of racist and inflammatory tweets before her handlers pulled the plug. And a recent paper from the Seattle-based Allen Institute for Artificial Intelligence (AI2) showed how even relatively innocent trigger words and phrases can be used to “inflict targeted errors” on natural language processing (NLP) models, triggering the generation of racist and hostile content.

With both vulgar humans and rogue bots to contend with in the online arena, the Intel and UC Santa Barbara datasets provide a valuable tool for both detection of and intervention on hateful comments.

The paper A Benchmark Dataset for Learning to Intervene in Online Hate Speech is on arXiv. The dataset has been open-sourced on GitHub.


Journalist: Tony Peng | Editor: Michael Sarazen

9 comments on “Researchers Use Vile Comments from Trump Subreddit to Train AI to Battle Hate Speech

  1. alicebobby

    I am also wondering about this question and I came here looking for the answer buckshot roulette

  2. Thnaks for sharing…….

  3. I am ecstatic to have discovered this diary. We would like to express our heartfelt gratitude for the significant amount of time and effort you put into picking this great article with the goal of enriching the knowledge of others. The event was recorded for future reference because it was such a fun experience.

  4. Given the prevalence of vulgar people and rogue bots in the online sphere, the Intel and UC Santa Barbara datasets offer a useful tool for identifying and addressing nasty remarks.

  5. Leon Kent

    Researchers have utilized data from certain online communities to train AI in combating hate speech. Platforms like abcya can demonstrate how educational and positive engagement online contrasts with the negativity addressed by these AI tools, promoting a healthier digital environment.

  6. Maisie Ali

    This research highlights an innovative approach to tackling hate speech by using data from online platforms. It’s fascinating to see how AI is being trained to create safer digital spaces. Platforms like crazygames emphasize user engagement, so ensuring respectful interactions is essential for a positive experience.

  7. Aaliyah Fox

    Interesting approach by researchers. Training AI to tackle hate speech is essential for creating safer online spaces. It’s great to see technology being used positively. Platforms like friv games also promote fun and inclusive online experiences, showing how diverse content can contribute to a healthier digital environment.

  8. linda78

    I have a question: What difficulties do AI surveillance models have in detecting and handling hate speech effectively? smashy road ask

  9. poppy minmi

    Word game fans will like redactle since it skillfully blends entertainment and instruction. It differs from conventional word games due to its distinct gameplay mechanics, a wide variety of texts, and focus on critical thinking.

Leave a Reply

Your email address will not be published. Required fields are marked *