How to identify Hate Speech on Twitter?

5 min readJan 11, 2021

Overview

The large fraction of hate speech and other offensive and objectionable content online poses a huge challenge to societies. Offensive language such as insulting, hurtful, derogatory or obscene content directed from one person to another person and open for others undermines objective discussions. Such type of language can be more increasingly found on the web and can lead to the radicalization of debates. Public opinion-forming requires rational critical discourse (Habermas 1984). Objectionable content can pose a threat to democracy. At the same time, open societies need to find an adequate way to react to such content without imposing rigid censorship regimes. As a consequence, many platforms of social media websites monitor user posts.

This leads to a pressing demand for methods to automatically identify suspicious posts.

Online communities, social media enterprises and technology companies have been investing heavily in technology and processes to identify the offensive language to prevent abusive behaviour in social media.

Attended talk at Forum for Information Retrieval Evaluation (FIRE) 2020

Problem statement

To classify a given post into hate or non-hate (HOF and NOT respectively). Within hate perform fine-grained classification into 3 subclasses: PRFN (Profane), HATE (Hate speech), OFFN (Offensive).

Dataset

There are two subtasks in this shared task:

1. Subtask A: Binary classification. Classifying a post into HOF or NOT denoting hate speech and non-hate speech respectively. For this, we have used the dataset provided by HASOC this year and also used an additional dataset named OLID, proposed by OffensEval last year. OLID data contains 4400 HOF posts and 8840 NOT posts.

2. Subtask B: Multi-class classification. Classifying a post into one of HATE, OFF, PRFN and NONE. The NOT posts from previous subtasks would be tagged NONE here. The following slide contains the data statistics per label. In presence of highly imbalanced datasets, a mechanism to handle this should be present, else the models will be biased towards the majority class.

Why would a swear word-based approach not work?

“This dude is f*cking ridiculous, is he made of rubber?”: NONE

“RT @NataliaNoyes: might f*ck around and go missing for a few months”: PRFN

“RT @sohmer: @realDonaldTrump The Importer pays the tariffs, you f*cking moron. You’ve levied a sales tax on your own citizens.”: HATE

“@MartinDaubney What the f*ck has happened to u Martin? Pinning ur colours to a racist like Farage, I’m ashamed of you”: OFFN

Hence, the boundaries between the fine-grain classes are difficult to capture!

Methodology

Pre-processing

SUBTASK A

1. Traditional machine learning approaches (SVM based)

2. Transfer learning approaches (BERT based)

Experiment for Hindi with two forms of data:

1. Original Devanagari form

2. Transliterated form

SVM based approach

Under this approach, we perform two things (on the preprocessed data):

a. Sentence encoding: With three types of embeddings, Universal sentence encoder (USE), TF-IDF embeddings and the average of all the word level fasttext embeddings.

b. Classification: SVM with RBF kernel as the classifier. Perform a grid search to get the ideal value of hyper-parameter C. Use 5-fold cross-validation to get the best model.

Experiment with two modes of loss calculation:

A. Class balanced: Give more weightage to the under-represented classes in the loss function. Weightage is inversely proportional to the number of samples in the class.

B. Normal mode: This is the standard loss calculation.

BERT based approach

We experiment with the following models on the preprocessed data.

a. English: Monolingual BERT base uncased model and the Multilingual BERT model. We experimented with two datasets:

1. The English 2020 training set

2. The English 2020 training set appended with OLID dataset.

b. Hindi: For Hindi, we experimented only with the Multilingual BERT model.

SUBTASK B

The preprocessing pipeline is the same as above. The dataset was imbalanced.

Two different approaches to this:

a. Consider the problem as a 4-class classification problem. (Include NONE also for

classification). This way, this subtask is treated independently of the first subtask.

b. We take subtask A’s model, filter out all the NOT labelled posts from it and tag them NONE. For the HOF label posts, classify them into OFFN, PRFN and HATE. (3-class classification problem)

Again experiment with the class balanced and regular way of calculating loss.

Discussion

SVM model < Multilingual BERT < Monolingual BERT wherever applicable with a margin of 0.02 and 0.012 F1 scores respectively. The difference between F1 scores between monolingual and multilingual models is not too much. The training time is more though.

The class balanced loss calculation has improved the results in the case of imbalanced datasets.

In Hindi subtask, A an improvement of up to 0.15 F1 score is observed.
In English subtask A not much improvement was observed which was expected since the dataset is balanced.
In subtask B up to 0.12 F1 score improvement was observed, which is significant again.

Using an additional dataset for English subtask A (OLID) has helped improve the macro F1 score by 0.01 macro in the case of BERT and 0.03 for Tfldf + SVM model.

For subtask, B. 4-class classification model was more biased towards NONE label as it forms the majority. The 3-class approach where we take outputs from the first subtask and label the NOT class as NONE has improved the number of samples being labelled into other classes.

The transliteration did not improve the results as such.

sauravmittal001/Hate-Speech-and-Offensive-Content-Identification

Course project: Information Retrieval and Web Search - sauravmittal001/Hate-Speech-and-Offensive-Content-Identification

github.com

Started in 2008 to build a South Asian counterpart for TREC, CLEF and NTCIR, FIRE has since evolved continuously to meet the new challenges in multilingual information access. For more info: http://fire.irsi.res.in