Hate Speech Detection

This project explores different NLP techniques for classifying text into "hatespeech", "offensive", or "normal" categories. It compares traditional machine learning models with modern large language models using zero-shot and few-shot prompting.

Project Structure

NLP_A2.ipynb: Jupyter notebook containing the implementation of traditional ML models (e.g., Logistic Regression, SVM) with TF-IDF and Word2Vec embeddings.
NLP_A3.ipynb: Jupyter notebook for hate speech classification using FLAN-T5 models with zero-shot and few-shot prompting techniques.
NLP_ass_train.tsv, NLP_ass_valid.tsv, NLP_ass_test.tsv: Dataset files containing the text and corresponding labels.
GoogleNews-vectors-negative300.bin: Pre-trained Word2Vec model from Google.

Dataset

The dataset is provided in tab-separated files (.tsv). Each file contains two columns: "text" and "label". The labels are "hatespeech", "offensive", and "normal".

Models and Methods

Traditional Models (`NLP_A2.ipynb`)

This notebook focuses on traditional machine learning approaches.

Features: TF-IDF and pre-trained Word2Vec embeddings (GoogleNews-vectors-negative300.bin) are used to represent the text data.
Models: Logistic Regression and Support Vector Machines (SVM) are trained on the extracted features.

Large Language Models (`NLP_A3.ipynb`)

This notebook utilizes the FLAN-T5 model for classification.

Models: google/flan-t5-base and google/flan-t5-small are used.
Techniques:
- Zero-shot learning: The model is prompted to classify the text without any prior examples.
- Few-shot learning: The model is provided with a few examples in the prompt to guide the classification.

Setup

Clone the repository.

Install the required Python libraries:

pip install pandas torch transformers scikit-learn jupyter gensim

Download the pre-trained Word2Vec model (GoogleNews-vectors-negative300.bin) if it is not already present.

Usage

Launch Jupyter Notebook or Jupyter Lab.
Open and run the cells in NLP_A2.ipynb to train and evaluate the traditional models.
Open and run the cells in NLP_A3.ipynb to perform classification using the FLAN-T5 models.

The notebooks will load the data, process the text, train the models (where applicable), and print classification reports including accuracy and F1-scores.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
A3.ipynb		A3.ipynb
NLP_A2.ipynb		NLP_A2.ipynb
NLP_A3.ipynb		NLP_A3.ipynb
NLP_Autumn_23_Assignment_2_3.pdf		NLP_Autumn_23_Assignment_2_3.pdf
NLP_ass_test.tsv		NLP_ass_test.tsv
NLP_ass_train.tsv		NLP_ass_train.tsv
NLP_ass_valid.tsv		NLP_ass_valid.tsv
NLP_ass_valid.xlsx		NLP_ass_valid.xlsx
README.md		README.md
formatted_predictions.txt		formatted_predictions.txt
nlp_ass_train.txt		nlp_ass_train.txt
test_label.txt		test_label.txt
test_text.txt		test_text.txt
train_label.txt		train_label.txt
train_predictions.txt		train_predictions.txt
train_text.txt		train_text.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hate Speech Detection

Project Structure

Dataset

Models and Methods

Traditional Models (`NLP_A2.ipynb`)

Large Language Models (`NLP_A3.ipynb`)

Setup

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hate Speech Detection

Project Structure

Dataset

Models and Methods

Traditional Models (NLP_A2.ipynb)

Large Language Models (NLP_A3.ipynb)

Setup

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Traditional Models (`NLP_A2.ipynb`)

Large Language Models (`NLP_A3.ipynb`)

Packages