Skip to content
View Dingsuper-creator's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Shanghai Ai Lab
  • Shanghai

Organizations

@OpenMedZoo

Block or report Dingsuper-creator

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Dingsuper-creator/README.md

Hi, I’m Chao Ding 👋

I am a medical AI researcher working at the intersection of clinical medicine, large language model evaluation, and trustworthy AI for healthcare.

My current research focuses on building evaluation and governance infrastructure for medical AI systems, including clinical LLM benchmarks, patient-facing AI safety evaluation, specialty triage, clinical agents, and human-verified reasoning datasets.

Google Scholar ORCID GitHub Email


Research Interests

  • Trustworthy medical AI and clinical LLM evaluation
  • Patient-facing AI safety, escalation behavior, and in-loop governance
  • Clinical agents, multimodal medical AI, and benchmark infrastructure
  • Human-verified clinical reasoning datasets and medical QA-CoT pipelines
  • High-stakes agent evaluation and multi-turn safety assessment
  • Real-world clinical data science using NHANES, BRFSS, and EHR-derived records

Selected Publications

  • Advancing medical AI through benchmarking and competition for specialty triage
    npj Digital Medicine, 2026.
    DOI: 10.1038/s41746-026-02433-8 · Project: MedBench

  • Beyond Knowledge to Agency: Evaluating Expertise, Autonomy, and Integrity in Finance with CNFinBench
    KDD 2026, accepted.
    arXiv: 2512.09506 · Project: CNFinBench · Code: GitHub

  • IBISAgent: Reinforcing Pixel-Level Visual Reasoning in MLLMs for Universal Biomedical Object Referring and Segmentation
    CVPR 2026, accepted.
    arXiv: 2601.03054

  • TyG index, depression, and cognitive dysfunction: NHANES with machine learning support
    Journal of Affective Disorders, 2025.
    DOI: 10.1016/j.jad.2025.01.051

  • Smoking types and stroke risk: development of a predictive model for identifying stroke risk
    Frontiers in Physiology, 2025.
    DOI: 10.3389/fphys.2025.1528910

  • Development of a predictive model for the U-shaped relationship between the triglyceride-glycemic index and depression using machine learning
    Heliyon, 2024.
    DOI: 10.1016/j.heliyon.2024.e38615

Research Themes

1. Benchmarking Clinical Foundation Models

I build evaluation frameworks for medical LLMs, multimodal models, and clinical agents. My work includes MedTriage / MedBench, a real-world specialty-triage benchmark derived from hospital intake records, online guidance dialogues, and outpatient clinical notes. The benchmark evaluates strict multi-label department recommendation and supported a MedBench competition with 37 teams.

Project: MedBench


2. Patient-Facing Medical AI Safety and Escalation

I study how medical LLMs behave in multi-turn patient-facing scenarios, especially when models should escalate urgent symptoms rather than continue generic conversation. I designed a 150-case simulated consultation benchmark and evaluated in-loop governance actions such as PASS, REWRITE, ASK-MORE, ESCALATE, and REFUSE.

This line of work focuses on delayed escalation, unsafe reassurance, insufficient triage, and safety-control trade-offs in patient-facing medical AI.


3. Clinical Reasoning and Trustworthy Evaluation Infrastructure

I contribute to clinician-audited benchmark infrastructure and human-verified clinical reasoning datasets, including rotating test pools, safety and ethics rubrics, LLM-as-a-judge calibration, and QA-CoT validation pipelines.

This work aims to make medical AI evaluation more auditable, clinically grounded, and deployment-aware.


4. High-Stakes Agent Evaluation Beyond Medicine

I also contributed to CNFinBench, a benchmark for evaluating financial LLM agents across expertise, autonomy, and integrity. The project studies how LLMs behave under decision-intensive workflows, tool-use settings, and multi-turn adversarial pressure.

Project: CNFinBench · Code: GitHub


5. Real-world Clinical Data Science

I apply statistical modeling and machine learning to large-scale clinical datasets, including NHANES and BRFSS. My first-author and co-first-author studies examine TyG index, depression, cognitive dysfunction, smoking behavior, and stroke-risk prediction, with cohorts ranging from 1,352 to 273,028 participants.


Technical Skills

Programming and analysis: Python, R, statistical analysis, reproducible visualization, pipeline development

Machine learning: XGBoost, Random Forest, logistic regression, LASSO, SHAP, ROC/AUC, calibration, cross-validation, bootstrap confidence intervals

Medical AI evaluation: benchmark construction, LLM-as-a-judge, rubric design, multi-turn dialogue evaluation, safety/ethics scoring, agent evaluation

Clinical data: NHANES, BRFSS, EHR/EMR-derived records, patient simulation, clinical terminology


Education and Training

  • Shanghai University of Traditional Chinese Medicine
    M.Med., Integrated Traditional Chinese and Western Clinical Medicine, expected Dec 2026

  • Shanghai Artificial Intelligence Laboratory
    Research Intern, Medical AI Group

  • Putuo District Central Hospital, Shanghai
    Resident Physician / Clinical Trainee


Contact

Pinned Loading

  1. OpenMedZoo/SafeMed-R1 OpenMedZoo/SafeMed-R1 Public

    9 1