Skip to content

DigitalHistory-Lund/SecToPat-PhilTransModelSelection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI

Philosophical Transactions Extraction Model Comparison

An interactive tool for comparing how well different local LLMs classify and extract structured data from articles in the Philosophical Transactions of the Royal Society (1665–1886).

Part of the Secrets to Patent project.

🌐 Live site: https://digitalhistory-lund.github.io/SecToPat-PhilTransModelSelection/

What's here

Path Purpose
index.html Self-contained comparison viewer with all data embedded
results.json Raw model evaluation results
about.html About page with citation and license info

What it shows

Side-by-side comparison of model outputs for a selection of Philosophical Transactions articles. Models evaluated include gemma4, ministral-3, qwen3.5, phi3.5, and others. For each article the viewer shows the original OCR'd text alongside the structured extraction from each selected model — including the experiment classification and extracted locations.

The HTML is generated by build_bench_viewer.py in the parent repository.

Prompts

The schema evolved during benchmarking. Two prompt versions are present in the data:

v2 — 3 questions (used for: gemma4, ministral-3):

You are analysing an article from the Philosophical Transactions of the Royal Society
(17th–19th century). Answer three questions:

1. is_experiment: Does this article describe one or more experiments or systematic
observations? Answer "yes", "no", or "unsure".

2. locations: List every spatial location where an experiment or observation took place.
For each, provide:
   - place: the immediate setting at room or building scale (e.g. "private study",
"ship's cabin", "kitchen"); null if not mentioned
   - geography: named place at city, region, country, estate, or vessel scale
(e.g. "London", "aboard HMS Endeavour"); null if not mentioned
   - detail: specific spatial detail within the place (e.g. "by the south window",
"in a dark corner"); null if not mentioned
Do not include apparatus or containers as locations. Return an empty list if no
spatial setting is mentioned.

3. participants: List every person named in the article who conducted, observed, or
contributed to the experiment or observation. For each, provide:
   - name: the person's name as it appears in the text
   - role: their role if stated (e.g. "experimenter", "observer", "subject",
"correspondent", "author"); null if not clear
Return an empty list if no individuals are named.

v1 — 2 questions (used for: llama3.2, phi3.5, qwen3.5):

Same as v2 but without question 3 (participants). Output schema: is_experiment + locations only.

Structured output was enforced via JSON schema using the Ollama API.

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). See LICENSE.

Citation

Machine-readable metadata is in CITATION.cff.

Contact

For questions or feedback, contact Mathias Johansson at MathiasJohansson@kultur.lu.se, or open an issue.

About

Side-by-side comparison of local LLMs for structured extraction from the Philosophical Transactions of the Royal Society. Part of the Secrets to Patent project.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages