You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
his repository contains code and resources related to an in-depth analysis of OpenAI's HealthBench, a benchmark designed for evaluating Large Language Models in the healthcare sector.
Auditing the clinical-evidence claims in OpenAI's HealthBench — its own gold answers & rubrics — for hallucinated, overgeneralized, overlooked & misweighted evidence. By NoBSmed.