Skip to content
@AARR-bench

AARR-bench

AARR-bench(Act As a Real Researcher)

Evaluating the ability of LLM Agents to conduct research: the core focus is — what exactly are the gaps between AI Agents and real human researchers?

Roadmap

  • AARRI-bench(Act As a Real Research Intern)(ongoing)
  • AARRA-bench(Act As a Real Research Assistant)(to be continued)
  • AARRS-bench(Act As a Real Research Scientist)(to be continued)

Popular repositories Loading

  1. AARRI-bench AARRI-bench Public

    Evaluating the ability of LLM Agents to conduct research: the core focus is — what exactly are the gaps between AI Agents and real human researchers?

    Python 8

  2. .github .github Public

  3. AARR-bench AARR-bench Public

    AARR-bench project website and series hub

    TypeScript

Repositories

Showing 3 of 3 repositories

Top languages

Loading…

Most used topics

Loading…