Chao Ding Dingsuper-creator

Hi, I’m Chao Ding 👋

I am a medical AI researcher working at the intersection of clinical medicine, large language model evaluation, and trustworthy AI for healthcare.

My current research focuses on building evaluation and governance infrastructure for medical AI systems, including clinical LLM benchmarks, patient-facing AI safety evaluation, specialty triage, clinical agents, and human-verified reasoning datasets.

Research Interests

Trustworthy medical AI and clinical LLM evaluation
Patient-facing AI safety, escalation behavior, and in-loop governance
Clinical agents, multimodal medical AI, and benchmark infrastructure
Human-verified clinical reasoning datasets and medical QA-CoT pipelines
High-stakes agent evaluation and multi-turn safety assessment
Real-world clinical data science using NHANES, BRFSS, and EHR-derived records

Selected Publications

Advancing medical AI through benchmarking and competition for specialty triage
npj Digital Medicine, 2026.
DOI: 10.1038/s41746-026-02433-8 · Project: MedBench
Beyond Knowledge to Agency: Evaluating Expertise, Autonomy, and Integrity in Finance with CNFinBench
KDD 2026, accepted.
arXiv: 2512.09506 · Project: CNFinBench · Code: GitHub
IBISAgent: Reinforcing Pixel-Level Visual Reasoning in MLLMs for Universal Biomedical Object Referring and Segmentation
CVPR 2026, accepted.
arXiv: 2601.03054
TyG index, depression, and cognitive dysfunction: NHANES with machine learning support
Journal of Affective Disorders, 2025.
DOI: 10.1016/j.jad.2025.01.051
Smoking types and stroke risk: development of a predictive model for identifying stroke risk
Frontiers in Physiology, 2025.
DOI: 10.3389/fphys.2025.1528910
Development of a predictive model for the U-shaped relationship between the triglyceride-glycemic index and depression using machine learning
Heliyon, 2024.
DOI: 10.1016/j.heliyon.2024.e38615

Research Themes

1. Benchmarking Clinical Foundation Models

I build evaluation frameworks for medical LLMs, multimodal models, and clinical agents. My work includes MedTriage / MedBench, a real-world specialty-triage benchmark derived from hospital intake records, online guidance dialogues, and outpatient clinical notes. The benchmark evaluates strict multi-label department recommendation and supported a MedBench competition with 37 teams.

Project: MedBench

2. Patient-Facing Medical AI Safety and Escalation

I study how medical LLMs behave in multi-turn patient-facing scenarios, especially when models should escalate urgent symptoms rather than continue generic conversation. I designed a 150-case simulated consultation benchmark and evaluated in-loop governance actions such as PASS, REWRITE, ASK-MORE, ESCALATE, and REFUSE.

This line of work focuses on delayed escalation, unsafe reassurance, insufficient triage, and safety-control trade-offs in patient-facing medical AI.

3. Clinical Reasoning and Trustworthy Evaluation Infrastructure

I contribute to clinician-audited benchmark infrastructure and human-verified clinical reasoning datasets, including rotating test pools, safety and ethics rubrics, LLM-as-a-judge calibration, and QA-CoT validation pipelines.

This work aims to make medical AI evaluation more auditable, clinically grounded, and deployment-aware.

4. High-Stakes Agent Evaluation Beyond Medicine

I also contributed to CNFinBench, a benchmark for evaluating financial LLM agents across expertise, autonomy, and integrity. The project studies how LLMs behave under decision-intensive workflows, tool-use settings, and multi-turn adversarial pressure.

Project: CNFinBench · Code: GitHub

5. Real-world Clinical Data Science

I apply statistical modeling and machine learning to large-scale clinical datasets, including NHANES and BRFSS. My first-author and co-first-author studies examine TyG index, depression, cognitive dysfunction, smoking behavior, and stroke-risk prediction, with cohorts ranging from 1,352 to 273,028 participants.

Technical Skills

Programming and analysis: Python, R, statistical analysis, reproducible visualization, pipeline development

Machine learning: XGBoost, Random Forest, logistic regression, LASSO, SHAP, ROC/AUC, calibration, cross-validation, bootstrap confidence intervals

Medical AI evaluation: benchmark construction, LLM-as-a-judge, rubric design, multi-turn dialogue evaluation, safety/ethics scoring, agent evaluation

Clinical data: NHANES, BRFSS, EHR/EMR-derived records, patient simulation, clinical terminology

Education and Training

Shanghai University of Traditional Chinese Medicine
M.Med., Integrated Traditional Chinese and Western Clinical Medicine, expected Dec 2026
Shanghai Artificial Intelligence Laboratory
Research Intern, Medical AI Group
Putuo District Central Hospital, Shanghai
Resident Physician / Clinical Trainee

Contact

Email: dingchaochao58@gmail.com
Google Scholar: Chao Ding
ORCID: 0009-0004-9652-4585
GitHub: Dingsuper-creator
Location: Shanghai, China

Provide feedback

Saved searches

Use saved searches to filter your results more quickly