人工智能木马（TrojAI）项目终报告

2026-05-04 #Tech

美国情报高级研究计划局（IARPA）启动了“人工智能木马”（TrojAI）项目，旨在应对现代人工智能中出现的潜在漏洞：人工智能木马的威胁。

人工智能木马是恶意软件，被故意嵌入到人工智能模型中，可能导致系统意外故障或被恶意行为者控制。

该项目绘制了威胁的复杂性，开创了基础检测方法，并确定了人工智能安全领域需要持续关注的未解决挑战。

报告总结了项目的关键发现，包括通过权重分析和触发器反演进行检测的方法，以及减轻部署模型中木马风险的策略。

测试和评估结果强调了检测器的性能、灵敏度以及“自然”木马的普遍性。

查看原文开头（英文 · 仅前 3 段）

Authors:Kristopher W. Reese, Taylor Kulp-McDowall, Michael Majurski, Tim Blattner, Derek Juba, Peter Bajcsy, Antonio Cardone, Philippe Dessauw, Alden Dima, Anthony J. Kearsley, Melinda Kleczynski, Joel Vasanth, Walid Keyrouz, Chace Ashcraft, Neil Fendley, Ted Staley, Trevor Stout, Josh Carney, Greg Canal, Will Redman, Aurora Schmidt, Cameron Hickert, William Paul, Jared Markowitz, Nathan Drenkow, David Shriver, Marissa Connor, Keltin Grimes, Marco Christiani, Hayden Moore, Jordan Widjaja, Kasimir Gabert, Uma Balakrishnan, Satyanadh Gundimada, John Jacobellis, Sandya Lakkur, Vitus Leung, Jon Roose, Casey Battaglino, Farinaz Koushanfar, Greg Fields, Xihe Gu, Yaman Jandali, Xinqiao Zhang, Tara Javidi, Akash Vartak, Tim Oates, Ben Erichson, Michael Mahoney, Rauf Izmailov, Xiangyu Zhang, Guangyu Shen, Siyuan Cheng, Shiqing Ma, XiaoFeng Wang, Haixu Tang, Di Tang, Xiaoyi Chen, Zihao Wang, Rui Zhu, Susmit Jha, Xiao Lin, Manoj Acharya, Weichao Zhou, Feisi Fu, Panagiota Kiourti, Chenyu Wang, Zijian Guo, H M Sabbir Ahmad, Wenchao Li, Chao Chen

View PDF

Abstract:The Intelligence Advanced Research Projects Activity (IARPA) launched the TrojAI program to confront an emerging vulnerability in modern artificial intelligence: the threat of AI Trojans. These AI trojans are malicious, hidden backdoors intentionally embedded within an AI model that can cause a system to fail in unexpected ways, or allow a malicious actor to hijack the AI model at will. This multi-year initiative helped to map out the complex nature of the threat, pioneered foundational detection methods, and identified unsolved challenges that require ongoing attention by the burgeoning AI security field. This report synthesizes the program's key findings, including methodologies for detection through weight analysis and trigger inversion, as well as approaches for mitigating Trojan risks in deployed models. Comprehensive test and evaluation results highlight detector performance, sensitivity, and the prevalence of "natural" Trojans. The report concludes with lessons learned and recommendations for advancing AI security research.

※ 出于版权考虑，仅引用前 3 段。完整内容请阅读原文。

— 阅读原文 ↗

阅读原文 ↗