Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.gitattributes text eol=lf
* text=auto

*.py text eol=lf
*.toml text eol=lf
*.lock text eol=lf
*.yaml text eol=lf
*.yml text eol=lf
16 changes: 15 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@
# JetBrains
.idea/

#generated yamls for testing
config/priors/patient/cases/batch_*.yaml

# uv / python
.venv/
__pycache__/
Expand All @@ -16,6 +19,14 @@ data/sessions/
data/reports/
manuscript/

# external / imported session logs (heavy + binary; provenance frozen in
# data/results/baseline_v0/BASELINE.md).
data/ext-session-logs/_extracted/
data/ext-session-logs/*.zip

# local analysis outputs (regenerable from analysis/validation/ + external data)
data/results/

# streamlit
.streamlit/

Expand All @@ -24,4 +35,7 @@ manuscript/
Thumbs.db

# Brainstorming
AUDIT.md
AUDIT.md
AUDIT_SYMPTOMS.md
results-temp.md
analysis/ANALYSIS.md
61 changes: 61 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,13 @@ dyadic-sim/
| |-- unconscious_emergence.py # special: did the hidden agenda surface?
| |-- frame_integrity.py # special: did ethical priors hold under pressure?
| |-- report.py # assembles full session report
| |-- validation/ # manipulation-validity checks (see Validation Analyses)
| |-- paths.py # canonical data/ input + output locations
| |-- sessions.py # shared session loader
| |-- symptom_embed.py # embedding manipulation check (target_z)
| |-- compare_baseline.py # pilot vs baseline delta + figure
|
|-- run_symptom_experiments.py # batch-generate symptom-isolated sessions
|
|-- data/
| |-- sessions/
Expand Down Expand Up @@ -286,6 +293,60 @@ Stage 2: different local models

---

## Validation Analyses

Beyond the six personhood markers, `analysis/validation/` holds checks on the
**construct validity of experimental manipulations** — currently the symptom-injection
study, which asks whether a PHQ-9 depression symptom written into the patient prior is
actually *expressed* by the patient agent (rather than silently ignored).

Analysis data lives under `data/` in two buckets (both gitignored; regenerable):

- `data/ext-session-logs/` — external / imported session archives (read-only inputs)
- `data/results/` — local analysis outputs (CSVs, figures, frozen baselines)

Paths are centralised in `analysis/validation/paths.py`; run the scripts from the repo
root with `PYTHONPATH=.`.

**1. Generate symptom-isolated sessions.** One PHQ-9 symptom is set to a frequency
anchor, the other eight to "not at all":

```bash
uv run python run_symptom_experiments.py \
--therapist llama3.1 --patient llama3.1 \
--case empty_and_invisible \
--symptoms depressed_mood,psychomotor_changes,sleep_problems \
--frequency "nearly every day" --repeats 20 --prefix pilot1
```

Sessions land in `data/sessions/` tagged `pilot1_<case>_<symptom>_run_<n>`.

**2. Run the embedding manipulation check.** Per session, cosine-similarity of the
patient's speech to each PHQ-9 reference, z-scored within session
(`target_z > 0` = the patient leans toward the injected symptom; `~0` = chance):

```bash
PYTHONPATH=. python analysis/validation/symptom_embed.py \
--sessions 'data/sessions/*' --prefix pilot1 \
--out-dir data/results/pilot1
```

**3. Compare against a baseline** — prints a delta table and writes a side-by-side
figure with 95% CIs:

```bash
PYTHONPATH=. python analysis/validation/compare_baseline.py \
--baseline data/results/baseline_v0/symptom_embed_long.csv \
--pilot data/results/pilot1/symptom_embed_long.csv \
--match-case --out-dir data/results/pilot1
```

Companion scripts in the package: `symptom_manifest.py` (transparent keyword/lexicon
check), `symptom_embed_bycase.py` (split by patient case), and `symptom_embed_robust.py`
(robustness to the reference wording).

---

## Theoretical Background

For the full psychological and philosophical motivation behind this project, see [`manuscript/README.md`](manuscript/README.md).
Expand Down
Loading