Skip to content

LeoFischbach/NLP-Final-Project

Repository files navigation

NLP Final Project — Less Processed Food Alternative Finder

This project predicts how processed a food product is (NOVA score, groups 1–4), searches a product database for similar items, and suggests less processed alternatives using retrieval plus a language model.

The web frontend was built with Lovable (see frontend/.lovable/): a React + Vite + TanStack app in frontend/. It talks to a FastAPI backend that runs OCR on an uploaded ingredient-list image and then runs the same NLP pipeline as the notebooks.

The pipeline combines a fine-tuned DistilBERT classifier, TF–IDF retrieval over labeled products, and an OpenAI chat model for judging and explaining alternatives.


What’s in this repository

Area Contents
data/ Balanced product dataset (food_nlp_balanced.xlsx) used for retrieval and training-related workflows.
models/bert/ BERT.ipynb — training / evaluation of the NOVA classifier; llm/ — LangGraph pipeline (graph.py) used by the API.
frontend/ Lovable-generated UI (frontend/.lovable/project.json). Install with package.json, run with npm run dev. The page posts to http://127.0.0.1:8000/recommend (product name + ingredient image).
backend/ FastAPI app — loads .env, OCR on image, then recommendation (backend/main.py). Required for the web demo.
logistic_regression/ Baseline NOVA model notebook for comparison.

What you need before running

Software

  • Python 3.10+ (3.11 recommended) and pip — backend + notebooks.
  • Node.js 20+ (LTS) and npm — frontend (frontend/package.json).
  • A Python virtual environment (venv) is recommended.

Accounts and tokens

Variable Required for Where to get it
OPENAI_API_KEY Yes — OCR text cleanup / vision step and the recommendation LLM (llm_init.py, backend/ocr.py) OpenAI API keys
HF_TOKEN Optional — Hugging Face Hub login for smoother downloads in notebooks Hugging Face tokens

The fine-tuned classifier weights belong in models/bert/bert_nova_best_model/. That folder is not on Git (size limits). Train with models/bert/BERT.ipynb or copy the folder from the project authors.


First-time setup

Python (repo root)

cd /path/to/NLP-Final-Project
python -m venv .venv

Activate:

  • macOS / Linux: source .venv/bin/activate
  • Windows (cmd): .venv\Scripts\activate.bat
  • Windows (PowerShell): .venv\Scripts\Activate.ps1
pip install -r requirements.txt

Node (frontend)

cd frontend
npm install

Environment file (.env)

At the repository root (same folder as requirements.txt):

  1. Copy:

    cp .env.example .env
  2. Set at least:

    OPENAI_API_KEY=sk-...

    Optionally:

    HF_TOKEN=hf_...

.env is git-ignored. The FastAPI app loads it automatically (backend/main.py). Do not commit secrets.


Important:
The trained BERT model is not included in this repository due to file size limits.
Before running the pipeline, please execute models/bert/BERT.ipynb to train and save the model locally in:

models/bert/bert_nova_best_model/

Running the full web demo (frontend + backend)

You need two terminals. Paths and ports matter: the UI expects the API at http://127.0.0.1:8000 (see frontend/src/routes/index.tsx).

Terminal 1 — API (Python)

From repo root:

cd /path/to/NLP-Final-Project
source .venv/bin/activate   # Windows: use Activate.ps1 instead
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000

Check: open http://127.0.0.1:8000/ — you should see {"status":"ok"}.

Terminal 2 — Lovable / React frontend

cd /path/to/NLP-Final-Project/frontend
npm run dev

Open the URL Vite prints (often http://localhost:5173). Enter a product name, upload an image of the ingredient list, and submit. The backend extracts text from the image and returns NOVA prediction and a less-processed alternative.

If something fails

  • OPENAI_API_KEY errors — ensure .env exists at repo root and restart uvicorn after editing.
  • Network / CORS / failed fetch — start the API before the frontend; keep port 8000.
  • Classifier errors — ensure models/bert/bert_nova_best_model/ exists.
  • Missing Excel — ensure data/food_nlp_balanced.xlsx is present.

Optional: Jupyter notebooks

  • models/bert/BERT.ipynb — trains/saves the DistilBERT NOVA model (HF_TOKEN optional).
  • models/generation.ipynb — pipeline experiments.
  • logistic_regression/Logistic_regression.ipynb — logistic regression baseline.

Use the same venv; set HF_TOKEN / OPENAI_API_KEY in the notebook environment if needed.


Summary for grading / demonstration

  1. pip install -r requirements.txt and place models/bert/bert_nova_best_model/ locally.
  2. cp .env.example .env and set OPENAI_API_KEY.
  3. npm install inside frontend/.
  4. Run uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000 from repo root.
  5. Run npm run dev in frontend/ and use the browser UI.

That matches how the Lovable frontend and FastAPI backend are wired in this repo.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors