人工智能检测器真的靠谱吗?
随着生成式人工智能的兴起,人们越来越关注如何区分人工智能生成的内容和人类创作的内容。
一项来自芝加哥布斯商学院的研究评估了四款主流的人工智能写作检测器,包括 GPTZero, Originality.ai, Pangram 和 RoBERTa。
研究表明,商业检测器在区分中长篇幅文本方面表现尚可,但在短文本识别上准确率较低,而开源模型RoBERTa的性能则明显较差。
Pangram 在准确率方面表现最佳,在大多数情况下能精确识别人工智能生成的内容,且误报率极低。
研究人员建议组织机构在部署检测器时设定“政策上限”,平衡防止人工智能滥用和避免虚假指控的风险,并定期进行性能审计,以应对人工智能检测技术的快速发展。
查看原文开头(英文 · 仅前 3 段)
Generative artificial intelligence has set off a tremendous amount of excitement, speculation, and anxiety thanks to its ability to convincingly mimic human work, including human writing. Although a machine that writes like a person is useful in many applications, an inability to discern human from AI writing can also create real problems: Students can avoid learning, lawyers can cite bogus case law, and journalists can publish misleading information.
Accusing someone of using AI inappropriately when they haven’t can have lasting reputational consequences; failing to identify AI-generated work can affect evaluations of human work. This conundrum has inspired a cottage industry of companies that claim to help users consistently tell the difference between AI and human writing. But how useful are they?
Research from Chicago Booth principal researcher Brian Jabarian and Booth’s Alex Imas evaluated consumer tools for identifying AI-generated text. Their results not only demonstrate the viability of AI writing detectors, but also suggest a data-driven method for schools, employers, and others to implement such tools in their own institutional settings.
※ 出于版权考虑,仅引用前 3 段。完整内容请阅读原文。