This project predicts how processed a food product is (NOVA score, groups 1–4), searches a product database for similar items, and suggests less processed alternatives using retrieval plus a language model.
The web frontend was built with Lovable (see frontend/.lovable/): a React + Vite + TanStack app in frontend/. It talks to a FastAPI backend that runs OCR on an uploaded ingredient-list image and then runs the same NLP pipeline as the notebooks.
The pipeline combines a fine-tuned DistilBERT classifier, TF–IDF retrieval over labeled products, and an OpenAI chat model for judging and explaining alternatives.
| Area | Contents |
|---|---|
data/ |
Balanced product dataset (food_nlp_balanced.xlsx) used for retrieval and training-related workflows. |
models/bert/ |
BERT.ipynb — training / evaluation of the NOVA classifier; llm/ — LangGraph pipeline (graph.py) used by the API. |
frontend/ |
Lovable-generated UI (frontend/.lovable/project.json). Install with package.json, run with npm run dev. The page posts to http://127.0.0.1:8000/recommend (product name + ingredient image). |
backend/ |
FastAPI app — loads .env, OCR on image, then recommendation (backend/main.py). Required for the web demo. |
logistic_regression/ |
Baseline NOVA model notebook for comparison. |
- Python 3.10+ (3.11 recommended) and
pip— backend + notebooks. - Node.js 20+ (LTS) and
npm— frontend (frontend/package.json). - A Python virtual environment (
venv) is recommended.
| Variable | Required for | Where to get it |
|---|---|---|
OPENAI_API_KEY |
Yes — OCR text cleanup / vision step and the recommendation LLM (llm_init.py, backend/ocr.py) |
OpenAI API keys |
HF_TOKEN |
Optional — Hugging Face Hub login for smoother downloads in notebooks | Hugging Face tokens |
The fine-tuned classifier weights belong in models/bert/bert_nova_best_model/. That folder is not on Git (size limits). Train with models/bert/BERT.ipynb or copy the folder from the project authors.
cd /path/to/NLP-Final-Project
python -m venv .venvActivate:
- macOS / Linux:
source .venv/bin/activate - Windows (cmd):
.venv\Scripts\activate.bat - Windows (PowerShell):
.venv\Scripts\Activate.ps1
pip install -r requirements.txtcd frontend
npm installAt the repository root (same folder as requirements.txt):
-
Copy:
cp .env.example .env
-
Set at least:
OPENAI_API_KEY=sk-...
Optionally:
HF_TOKEN=hf_...
.env is git-ignored. The FastAPI app loads it automatically (backend/main.py). Do not commit secrets.
Important:
The trained BERT model is not included in this repository due to file size limits.
Before running the pipeline, please execute models/bert/BERT.ipynb to train and save the model locally in:
models/bert/bert_nova_best_model/
You need two terminals. Paths and ports matter: the UI expects the API at http://127.0.0.1:8000 (see frontend/src/routes/index.tsx).
From repo root:
cd /path/to/NLP-Final-Project
source .venv/bin/activate # Windows: use Activate.ps1 instead
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000Check: open http://127.0.0.1:8000/ — you should see {"status":"ok"}.
cd /path/to/NLP-Final-Project/frontend
npm run devOpen the URL Vite prints (often http://localhost:5173). Enter a product name, upload an image of the ingredient list, and submit. The backend extracts text from the image and returns NOVA prediction and a less-processed alternative.
OPENAI_API_KEYerrors — ensure.envexists at repo root and restart uvicorn after editing.- Network / CORS / failed fetch — start the API before the frontend; keep port 8000.
- Classifier errors — ensure
models/bert/bert_nova_best_model/exists. - Missing Excel — ensure
data/food_nlp_balanced.xlsxis present.
models/bert/BERT.ipynb— trains/saves the DistilBERT NOVA model (HF_TOKENoptional).models/generation.ipynb— pipeline experiments.logistic_regression/Logistic_regression.ipynb— logistic regression baseline.
Use the same venv; set HF_TOKEN / OPENAI_API_KEY in the notebook environment if needed.
pip install -r requirements.txtand placemodels/bert/bert_nova_best_model/locally.cp .env.example .envand setOPENAI_API_KEY.npm installinsidefrontend/.- Run
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000from repo root. - Run
npm run devinfrontend/and use the browser UI.
That matches how the Lovable frontend and FastAPI backend are wired in this repo.