Skip to content

jselmani/grief-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Grief Language Analysis and Screening Prototype

Course: AI380S26 — AI in Healthcare Author: Jiel Selmani

This project applies five core NLP techniques to a corpus of approximately 3,000 posts from the public r/GriefSupport subreddit and uses the resulting feature set to build a two-stage screening prototype that scores a free-text reflection against the seven dimensions of the PG-13-R prolonged-grief-disorder framework (Prigerson et al., 2021).

  • Stage 1 extracts structured features from the reflection: VADER sentiment, an NRC-derived emotion lexicon, LIWC-style linguistic categories, and per-dimension PG-13-R scores.
  • Stage 2 forwards the text and Stage 1 scores to a large language model (Anthropic Claude or OpenAI GPT) for narrative interpretation.

This is a research prototype. It is not a clinical assessment, not a diagnostic instrument, and not a substitute for professional care.

Project layout

grief-analysis/
├── src/                analysis pipeline (collect, preprocess, analyze, screen)
│   ├── collect_data.py     download r/GriefSupport posts via Arctic Shift
│   ├── preprocess.py       clean and normalize raw posts
│   ├── analysis.py         five core NLP techniques (figures 1–6)
│   ├── deep_analysis.py    cross-loss, complicated grief, trajectories
│   └── screener.py         two-stage screening prototype
├── api/                Flask JSON API exposing the screener
│   └── server.py
├── web/                Vite + React + Tailwind frontend
├── data/               raw and processed post data
├── results/            CSVs and JSON summaries
├── figures/            generated plots (fig0–fig11)
├── references/         cited PDFs
├── report/             write-up
└── requirements.txt

Prerequisites

  • Python 3.11 or newer
  • Node.js 18 or newer (for the web UI)
  • An API key for Anthropic or OpenAI (only required for Stage 2 interpretation; Stage 1 runs locally without a key)

Setup

1. Clone and create a virtual environment

cd grief-analysis
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

The first run of analysis.py or screener.py will download the NLTK resources it needs (vader_lexicon, stopwords, punkt).

2. Install web dependencies

cd web
npm install

Running the screener

The API and the web UI run as two separate processes.

Start the Flask API

# from the project root, with the venv active
python api/server.py
# → http://localhost:5001

Endpoints:

Method Path Body
GET /api/health
POST /api/screen { text, provider, api_key?, model? }

provider is anthropic or openai. If api_key is omitted, only Stage 1 results are returned.

Start the web UI

cd web
npm run dev
# → http://localhost:5173

Vite proxies /api/* to the Flask server on port 5001. Open the Settings panel in the header to choose Anthropic or OpenAI, paste an API key, and optionally override the default model. Keys are stored in browser localStorage and sent only with screening requests.

Default models:

  • Anthropic: claude-sonnet-4-5-20250929
  • OpenAI: gpt-4o-mini

Reproducing the analysis

Each step writes to data/, results/, or figures/. Run them in order from the project root with the venv active:

python src/collect_data.py        # download posts via Arctic Shift
python src/preprocess.py          # clean and normalize
python src/analysis.py            # five core NLP techniques (figs 1–6)
python src/deep_analysis.py       # cross-loss, complicated grief, trajectories
python src/screener.py --evaluate # validate Stage 1 against the corpus

Other screener.py modes:

python src/screener.py --demo                       # Stage 1 on sample texts
python src/screener.py --demo --api-key sk-...      # Full pipeline on samples
python src/screener.py --interactive                # Paste-and-screen

Outputs

  • results/grief_posts_analyzed.csv, grief_posts_deep_analysis.csv — per-post feature tables
  • results/analysis_summary.json, deep_analysis_summary.json — aggregate summaries
  • results/screener_evaluation.json — Stage 1 validation against the corpus
  • figures/fig0fig11 — methodology pipeline, sentiment, emotion, linguistic features, word clouds, topics, temporal trends, cross-loss fingerprints, complicated grief, trajectories, community response, and screener evaluation plots

Disclaimer

This is a research prototype built for coursework. It is not a clinical assessment, not a diagnostic instrument, and not a substitute for professional care. If you or someone you know is struggling with grief or mental health, please reach out to a qualified professional.

About

NLP analysis of ~3,000 r/GriefSupport posts and a two-stage screening prototype that scores free-text reflections against the seven PG-13-R prolonged-grief dimensions. Stage 1: VADER, NRC, emotion, LIWC-style features. Stage 2: LLM narrative interpretation. Coursework — not a clinical tool.

Resources

Stars

Watchers

Forks

Contributors