A production-grade movie recommendation system built with TF-IDF vectorization, cosine similarity, and a custom hybrid scoring algorithm — deployed as a cinematic Netflix-style Flask web application.
Most movie recommenders stop at basic cosine similarity. CineMatch AI goes further with 7 unique features that set it apart:
| Feature | Description |
|---|---|
| 🧬 Movie DNA Analysis | Every recommendation shows why it was suggested — shared genres, cast members, and director connections displayed as visual tags |
| ⚗️ Hybrid Scoring Engine | A custom formula: Content Similarity (65%) + Bayesian Weighted Rating (28%) + Popularity Boost (7%) — better results than raw similarity |
| 🎭 Mood-Based Discovery | 6 mood filters (Happy, Thrilling, Romantic, Scary, Thoughtful, Adventurous) dynamically pre-filter the entire dataset by genre clusters before ranking |
| 📊 Cinematic Analytics Dashboard | Interactive Chart.js EDA with 5 visualizations: genre distribution, yearly trends, rating spread, runtime analysis, top directors |
| 💜 Session Watchlist | Add/remove movies with AJAX toggling — no page reload, persistent across the session |
| 🔍 Live Autocomplete | Real-time search suggestions from 5000+ movie titles with 200ms debounce |
| 🖱️ Clickable Recommendations | Click any recommended movie to instantly get its own recommendations — chain discovery |
Syntecxhub_Project_Movie_Recommendation_system/
│
├── app.py # Flask application — all routes & logic
├── train_model.py # Data processing & model training pipeline
├── config.py # API keys & configuration
├── requirements.txt # Python dependencies
├── Dockerfile # Docker config for HuggingFace deployment
├── .gitignore
├── LICENSE
├── README.md
│
├── data/ # Dataset CSVs (not in repo — download from Kaggle)
│ ├── tmdb_5000_movies.csv
│ └── tmdb_5000_credits.csv
│
├── models/ # Trained model files (not in repo — generated locally)
│ ├── movies.pkl
│ └── similarity.pkl
│
├── static/
│ ├── css/
│ │ └── style.css # Full dark cinematic UI — 600+ lines
│ └── js/
│ └── main.js # Autocomplete, watchlist, mood, card click
│
└── templates/
├── base.html # Base layout with navbar, toast, footer
├── index.html # Home — search + mood filters + popular row
├── results.html # Netflix grid with poster cards + DNA tags
├── eda.html # Analytics dashboard with Chart.js
└── watchlist.html # Saved movies list
- Python 3.11
- pip (latest version)
git clone https://github.com/rafiul254/Syntecxhub_Project_Movie_Recommendation_system.git
cd Syntecxhub_Project_Movie_Recommendation_systempython -m venv venv
# Windows
venv\Scripts\activate
# Mac/Linux
source venv/bin/activatepip install -r requirements.txtGo to TMDB 5000 Movie Dataset on Kaggle and download:
tmdb_5000_movies.csvtmdb_5000_credits.csv
Place both files inside the data/ folder.
- Register at themoviedb.org
- Go to Settings → API → Request API Key → Developer
- Open
config.pyand paste your key:
TMDB_API_KEY = 'your_api_key_here'python train_model.pyWait for: Done! XXXX movies indexed. Models saved to /models/
python app.pyText features from each movie are combined into a unified "tags" string:
tags = overview + genres + keywords + cast (top 4) + director
An 8000-feature TF-IDF matrix transforms all tags into numerical vectors. Rare but meaningful terms receive higher weight than common words.
Pairwise cosine similarity is computed across all movies. The top-50 most similar candidates per movie are pre-indexed for fast retrieval at runtime.
Score = 0.65 × Cosine Similarity
+ 0.28 × Bayesian Weighted Rating
+ 0.07 × Popularity Boost
Bayesian = (v / v+m) × R + (m / v+m) × C
Where:
v = vote count for the movie
m = minimum votes threshold (60th percentile)
R = movie's average rating
C = mean rating across all movies
When a mood is selected, the entire dataset is pre-filtered to movies matching the mood's genre cluster. TF-IDF similarity is then computed dynamically within that filtered pool — ensuring mood-relevant results every time.
| Mood | Genres Targeted |
|---|---|
| 😄 Happy | Comedy, Animation, Family, Music |
| ⚡ Thrilling | Action, Thriller, Crime, Adventure |
| 💜 Romantic | Romance, Drama |
| 👻 Scary | Horror, Mystery |
| 🎭 Thoughtful | Drama, Documentary, History |
| 🚀 Adventurous | Adventure, Fantasy, Science Fiction, Western |
| Layer | Technology |
|---|---|
| Backend | Python 3.11, Flask 2.3.3 |
| ML | scikit-learn (TF-IDF, Cosine Similarity), pandas, numpy |
| Frontend | HTML5, CSS3 (600+ lines custom), Vanilla JavaScript |
| Charts | Chart.js 4.4.0 |
| Fonts | Inter, Space Grotesk (Google Fonts) |
| Poster API | TMDB API v3 |
| Dataset | TMDB 5000 Movies (Kaggle) |
| Deployment | HuggingFace Spaces (Docker) |
| Property | Value |
|---|---|
| Source | TMDB 5000 Movies Dataset — Kaggle |
| Movies indexed | ~4800 (after cleaning) |
| Features used | Title, Overview, Genres, Keywords, Cast, Director, Ratings, Popularity |
| Model size | ~50MB (excluded from repo) |
This project is deployed on HuggingFace Spaces using Docker.
🔗 Live URL: https://rafi-ul-cinematch-ai.hf.space
Step 1 — Create account at huggingface.co
Step 2 — New Space → Name: cinematch-ai → SDK: Docker → Visibility: Public
Step 3 — Add Secrets in Space Settings:
TMDB_API_KEY = your_tmdb_api_key
SECRET_KEY = your_secret_key
Step 4 — Add HuggingFace remote and push:
git remote add space https://huggingface.co/spaces/YOUR_HF_USERNAME/cinematch-ai
git push space main --forceStep 5 — Build starts automatically. Takes 20-30 minutes first time.
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
RUN python train_model.py
EXPOSE 7860
CMD ["gunicorn", "app:app", "--bind", "0.0.0.0:7860", "--workers", "1", "--timeout", "120"]Flask==2.3.3
Werkzeug==2.3.7
scikit-learn==1.6.0
pandas==2.1.4
numpy==1.24.4
requests==2.31.0
gunicorn==21.2.0
Rafiul Islam
Currently IoT & Robotics Engineering Student,
University of Frontier Technology Bangladesh (UFTB) . Syntecxhub ML Internship .
This project is licensed under the MIT License — see LICENSE for details.