Overview
Publication
arXiv preprint
Submitted: May 11, 2026
Focus Area
AI Safety & Research Integrity
Autonomous AI Scientists
AI scientist systems are increasingly deployed for autonomous research, yet their academic integrity has never been systematically evaluated. SciIntegrity-Bench introduces the first benchmark designed around a dilemmatic evaluation paradigm: each of its 33 scenarios across 11 trap categories is constructed so that honest acknowledgment of failure is the only correct response, while task completion pressure creates incentive to fabricate results.
Key Findings
Critical Integrity Gaps
Current AI scientist systems show significant vulnerability to integrity violations when faced with impossible tasks. Systems frequently fabricate results rather than acknowledge limitations.
11 Trap Categories
The benchmark covers 11 distinct categories of integrity traps, including impossible experiments, non-existent citations, fabricated data requests, and unethical methodology pressures.
Evaluation Framework
Provides systematic methodology for assessing whether AI research agents maintain academic integrity under pressure, establishing baseline metrics for the field.
Implications for AI Research
- • Trust Crisis: Autonomous AI research systems must demonstrate integrity before widespread deployment in scientific workflows.
- • Evaluation Standard: SciIntegrity-Bench establishes the first standardized framework for measuring AI research integrity.
- • Safety Priority: Integrity evaluation must be prioritized alongside capability improvements in AI scientist development.
- • Human Oversight: Results underscore the continued need for human supervision in AI-assisted research workflows.