AI Measurement Science
- 6 followers
- United States of America
- http://aimslab.stanford.edu/
- sttruong@cs.stanford.edu
Popular repositories Loading
-
-
fantastic-bugs
fantastic-bugs PublicFantastic Bugs and Where to Find Them in AI Benchmarks
-
benchmark-chisel
benchmark-chisel PublicAutomatic Revising of Problematic Items in AI Agentic Benchmarks
-
benchmark-caliper
benchmark-caliper PublicUplifting Human Decision Making in AI Evaluation by Automating Benchmark Validity Analysis
Repositories
- safety-irt Public
A Measurement Analysis of Multilingual Safety Evaluation (under review at COLM 2026)
aims-foundations/safety-irt’s past year of commit activity - redteam-measurement Public
Long-form response matrices for adaptive AI red-teaming benchmarks (JailbreakBench, HarmBench, StrongREJECT, Do-Not-Answer), formatted to the aims-foundations/measurement-db schema.
aims-foundations/redteam-measurement’s past year of commit activity - benchmark-caliper Public
Uplifting Human Decision Making in AI Evaluation by Automating Benchmark Validity Analysis
aims-foundations/benchmark-caliper’s past year of commit activity - strategic_evaluation Public
aims-foundations/strategic_evaluation’s past year of commit activity
People
This organization has no public members. You must be a member to see who’s a part of this organization.
Top languages
Loading…
Most used topics
Loading…