Tweet Sentiment Prediction

A real-time web application that scrapes tweets via Selenium, streams them into Kafka, processes them with Apache Spark, and performs sentiment prediction using a pre-trained logistic regression model.

Project Structure

Note: The sentiment_BERT/ directory contains large model files (≈420 MB) and is not included in this repository.
Please download it manually before running the app. (https://drive.google.com/drive/folders/1RqGCpUjVUT0-F05LE1pulAeueogXflrS)

kafka_2.12-3.5.0/               # Kafka distribution
├── bin/                        # Kafka CLI scripts
├── config/
├── libs/
├── ...
static/
  └── style.css                 # CSS for the web UI
templates/
  └── index.html                # HTML template
app.py                          # Flask + Spark + Kafka integration
scraper.py                      # Selenium scraper & Kafka producer
sentiment_BERT/                 
├── config.json                        
├── model.safetensors
├── special_tokens_map.json
├── tokenizer_config.json
├── training_args.bin
└── vocab.txt
logreg_sentiment140_model.pkl   # Pre-trained sentiment model
README.md                       # This file

Prerequisites

Java 8+ (for Spark & Kafka)
Kafka & Zookeeper
Python 3.8+ and pip
Google Chrome (for Selenium)

Python Dependencies

pip install flask pyspark kafka-python selenium webdriver-manager joblib

Setup & Run

Start Zookeeper & Kafka

# In one terminal
bin/zookeeper-server-start.sh config/zookeeper.properties

# In another
bin/kafka-server-start.sh config/server.properties

Delete & Recreate Topic (optional, to purge old data)

# Delete existing topic
bin/kafka-topics.sh \
  --bootstrap-server localhost:9092 \
  --delete --topic tweets

# Recreate with 4 partitions
bin/kafka-topics.sh \
  --bootstrap-server localhost:9092 \
  --create \
  --topic tweets \
  --partitions 4 \
  --replication-factor 1

(Optional) View Topic Contents

bin/kafka-console-consumer.sh \
  --bootstrap-server localhost:9092 \
  --topic tweets \
  --from-beginning

Run the App
```
python app.py
```
- The Flask server will launch on http://127.0.0.1:5000.
- Use the Fetch Tweets button to start the scraper (opens headless Chrome, scrapes and pushes tweets to Kafka).
- Stop Fetching Tweets stops the scraper process.
- Start Prediction stops scraping, consumes all tweets from the topic, runs sentiment prediction, and displays results.

How It Works

scraper.py: Uses Selenium to scroll through Twitter search results (x.com) and produces tweet text messages into Kafka topic tweets.
app.py:
- Spins up a SparkSession with the Kafka SQL connector.
- Broadcasts a pre-trained logistic regression model for inference.
- Provides Flask endpoints:
  - /fetch_tweets → spawns scraper.py as a subprocess
  - /stop_fetch → terminates the scraper and fetches any remaining tweets
  - /start_prediction → reads entire topic from earliest offset, applies the model via a Spark UDF, and stores predictions
  - /get_tweets → returns a JSON list of { tweet, prediction }
index.html + style.css: A simple UI with controls and a live table that polls /get_tweets every 2 seconds, animates new rows, and color‑codes predictions.

UI Flow

Fetch Tweets → begins scraping & streaming to Kafka, table updates live.
Stop Fetching Tweets → stops scraper but table continues polling Kafka for completeness.
Start Prediction → stops polling (optionally), runs batch prediction over all messages, updates table cells with sentiment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tweet Sentiment Prediction

Project Structure

Prerequisites

Python Dependencies

Setup & Run

How It Works

UI Flow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
kafka/kafka_2.12-3.5.0		kafka/kafka_2.12-3.5.0
static		static
templates		templates
.gitignore		.gitignore
README.md		README.md
app.py		app.py
logreg_sentiment140_model.pkl		logreg_sentiment140_model.pkl
scraper.py		scraper.py

Folders and files

Latest commit

History

Repository files navigation

Tweet Sentiment Prediction

Project Structure

Prerequisites

Python Dependencies

Setup & Run

How It Works

UI Flow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages