モジラによるMythos利用の話題は、良くても欺瞞的、悪くても悪質

#Tech

モジラによるMythos利用の話題は、良くても欺瞞的、悪くても悪質 Mythos発見の裏側コスト

モジラがAnthropicのAIモデルMythosを用いて271件のバグを発見したという発表に対し、筆者は強い懐疑的な見方を示している。

発表内容には、発見バグ数と実際に公開されたCVEの間に大きな乖離があり、技術的詳細のみに偏りすぎている点が指摘されている。

また、スキャン期間、トークン消費量、成功率といった運用コストに関する情報が意図的に欠落していることが批判されている。

筆者は、これらの運用コストと複雑な工程を考慮すると、真の費用は数百万ドルに上り、この発表は実質的にマーケティングの一環であると主張している。

Mozillaが、Anthropic社のAIモデル「Mythos」を活用したセキュリティ脆弱性発見の成果を報告しましたが、その発表内容にはいくつかの疑問が呈されています。報告では271件の脆弱性(バグ)が発見されたとされていますが、このプロセスにおけるコストや詳細な技術的な側面が十分に開示されていない点が、専門家の間で議論を呼んでいます。

発見されたバグの深刻度に関する懸念

Mozillaは、発見されたバグの中には「サンドボックスからの脱出」が必要なものがあり、完全なFirefoxの侵害には他の脆弱性との組み合わせが必要だと説明しています。しかし、指摘されているバグの多くは、単体では直接悪用が困難な「use-after-free」などのクラッシュ症状として検出されているとのことです。

そのため、検出された271件という数字と、実際の攻撃に結びつく深刻度との間には大きなギャップがあるのではないかという見方が出ています。開発側は脅威モデルとして「十分な努力があれば悪用可能」としていますが、その現実的な難易度については議論の余地があるようです。

AI利用にかかった莫大なコストの不明瞭さ

特筆すべきは、この脆弱性発見プロセスにかかった具体的なコストが明記されていない点です。単に271件のバグを発見したという報告だけでは、AI利用にどれほどの費用がかかったのかが分かりません。

専門家による試算では、一般的なエージェント的なループ(Mythosの利用単位)で$200〜$500、サンドボックス脱出のような重い作業では$500〜$2000という費用が想定されます。さらに、Mozillaが「複数のモデル呼び出しをスタッキングしている」と説明しているため、実際にかかったAPIトークンの費用は$30万〜$140万の範囲に達する可能性も指摘されています。

システム構築と運用にかかる工数

また、この脆弱性発見パイプラインの構築と運用にかかる工数についても、詳細が欠落していると指摘されています。Mozillaは「検出を生成し、ノイズをフィルタリングするために、モデルを調整し、スケールさせ、スタックさせた」と述べています。

この「スタッキング」や、多数の失敗したサンドボックス脱出試行の実行には、膨大な計算リソースと、Firefoxエンジニアとの密接なフィードバックループを伴う大規模なエンジニアリング作業が必要だったと推測されています。

AI活用における情報開示の課題

今回の事例は、AIを活用したセキュリティ研究の進展を示す一方で、情報開示の課題を浮き彫りにしています。単に成果の数値を提示するだけでなく、その成果に至るまでの技術的な難易度、リソース、そしてコストといった「プロセス」の透明性が求められています。

今後、AIがどのように技術開発に貢献していくかを見極める上で、このような詳細な情報開示が重要になると考えられます。

原文の冒頭を表示(英語・3段落のみ)

I saw this article today from Mozilla detailing their use of Mythos and I have Some Thoughts™️ about it. In this blog post, Mozilla lightly details their use of Anthropic’s new, only-available-for-certain-companies model. Interestingly, they are very keen to cover certain parts of their flow, while noticeably leaving out key items. I would recommend skimming the article before coming back to this post, at least to get a general idea of it.One thing to note is, I’m not super familiar with vulnerabilities and exploiting them, so please take what I say on them specifically with a grain of salt.First, I have a heap of skepticism around the severity and exploitability of these bugs. Mozilla note themselves that “[...] a number of these bugs are sandbox escapes, which would need to be combined with other exploits to achieve a full-chain Firefox compromise”. From my understanding, these are not directly exploitable since you’d need another RCE inside the sandbox itself to be able to utilize those. Still, obviously a good idea to fix them ahead of time and not wait for an RCE to be found.They go on to note, in the FAQ of the article: “We classify sec-high based on predictable crash symptoms such as use-after-free or out-of-bounds memory issues being reported by AddressSanitizer, and our threat model assumes that any of them could be exploitable with sufficient effort”. Again, understandable, I don’t expect them to have individually checked each potential bug for outright exploitability in the wild.But, and this is a big but, I feel like there’s a large gap between “271 vulnerabilities” and “271 bugs that could potentially be exploited in combination with other bugs, and in many cases are only bugs as decided by a testcase that trips ASan regardless of how many of these bugs survive contact with the mitigation stack and other security measures in a real world exploit”.Other people online with far more understanding than me have also raised some skepticism around the reported bug number of 271, compared to the 3 CVEs that Mozilla actually posted [1]. The whole classification thing kinda goes over my head a bit, so I decided not to dwell on it here. What stood out to me is that none of Mozilla’s reporting mentions: how long this whole process took, how many instances of Mythos they had to run, for how long, or with what success rate. Crucially: what’s the token burn for this? A yield-based cost estimate would be 271 bugs, at a 10-20% hit rate [2], at Mythos pricing [3] of plausibly $200-500 for a typical agentic loop and $500-2000 for the heavier sandbox-escape work [4], lands you in the range of $300K-1.4M in API tokens. However, this is before accounting for three documented multipliers. The following are all quoted from their blog:“[...] we dramatically improved our techniques for harnessing these models — steering them, scaling them, and stacking them to generate large amounts of signal and filter out the noise”. Their harness “stacks” multiple model calls per candidate (proposing, verifying, reproducing, testing). I’d flag this at around 3-10x cost-per-finding multiplier, though they are being (purposefully, in my opinion, as I’ll detail later) extremely vague around this.“While auditing logs from the harness, we saw many attempts to pursue this line of escape that were thwarted by [their previous hardening work]”. This is confirmation that “many” sandbox-escape attempts ran end-to-end and produced nothing. I don’t know what “many” is, but given general LLM success rate, this can be well into the thousands of failed research runs.“While harnesses may be reusable across projects, this pipeline is inherently project-specific, reflecting each codebase’s semantics, tooling, and processes. Standing this up required significant iteration, with a tight feedback loop alongside the Firefox engineers who were fielding the incoming bugs”. This to me implies that significant engineering effort was needed to even set up the system to be able to set Mythos loose on their codebase. I also count this into the cost of using AI; if you need engineering effort to use something, then that engineering effort is part of the cost.At this point, I’m not sure numbers make sense to estimate. Depending on the exact failure rate, engineering effort, what exactly “stacking” models implies, the costs could easily be anywhere in the range of $2M-10+M. Mozilla is intentionally being very opaque about the operational costs of the whole thing [5], so all these extra variables past the initial cost are at best hand-wavy estimates.The reason I’m bringing up all the economics is this. Of course, it’s genuinely extremely cool that an AI was able to find 271 different bugs (no, I refuse to call them all vulnerabilities). I’m left wondering however, would have Mozilla had worse results throwing $2-10M at an external, human red-team? Any company concerned with vulnerabilities can, at any point in time, run red-teaming exercises. And they have been! This is nothing new! You could’ve always thrown a couple million dollars at a problem and seen impressive results.Mind you, this is without even getting into whether Anthropic can afford to run Mythos at just $25/$125-per-million, but that’s a different story for another day of ranting. Sources are split on whether the big AI companies turn a profit on API (though mainly lean towards them losing money on it), but all those discussions only look at whether they make profit on inference. You know training also costs money, right? Profit on inference (if there’s profit there) still isn’t guaranteed to get AI companies to a net profit [6]. Imagine the cost if Anthropic had to make a profit on it!My third and final point, which I think everyone could’ve seen from a whole 1M context window away, was that this is all marketing. Again, the results are genuinely impressive and, in my opinion, very cool. But I want to analyze the intent. The tech is impressive, so the blog post focuses on the tech. If the pricing was impressive, wouldn’t that be a core part of it? Wouldn’t you expect Mozilla to open with the line “Cybersecurity is solved - we fixed X bugs for Y dollars” (where Y is a reasonable number with a reasonable amount of digits)? If the number was anywhere close to impressive, it’d be the only thing you heard about on the news.The post goes in depth on technical details but fails to mention token spend, session count, scan duration, hit rate, false-positive rate, harness iteration count, or even how Mythos compares head-to-head against Opus 4.7. They said, and I’m quoting this bit again: “Once the end-to-end pipeline is in place, it’s trivial to swap in different models when they become available”. Awesome! If you’re arguing in good faith, plug Opus 4.7 into your trivial-to-swap harness and run it on the same data set. Show us how much better Mythos is!One important thing to note is, the costs I mentioned above? Mozilla didn’t even pay for this. The whole operation is being covered by Anthropic under Project Glasswing, where they dedicate a pool of $100M to companies in this project [7]. Mozilla isn’t even feeling the cost here. Of course it’s massively impressive when it’s free.The whole cover story of Mythos being “too dangerous to release publicly” is all smoke and mirrors. If the model was indeed that good, why wouldn’t Anthropic release it and charge accordingly so that they’d eventually finally turn a profit? Why would they sink $100M into companies using it behind closed doors, with no accountability, benchmarks, or any sort of cost estimates?Because $100M is the marketing budget. I can’t see it any other way.[1] https://www.flyingpenguin.com/mythos-mystery-in-mozilla-numbers-how-22-vulns-became-271-or-maybe-3-in-april/[2] I’m basing this off of their previous reporting of using Opus to find bugs. Anthropic submitted 112 “unique reports”, which Mozilla narrowed down to 22 CVEs. 22/112 ~= 19.6% which I rounded up to 20%. However, the 112 reports are what came after Anthropic’s internal triage, so the true model-output-to-confirmed-bug-ration is necessarily lower. So I’m using 10-20% to be charitable. Sources: https://blog.mozilla.org/en/firefox/hardening-firefox-anthropic-red-team/, https://www.anthropic.com/news/mozilla-firefox-security.[3] $25/$125-per-million, 5x that of Opus 4.7. Sources: https://www.reddit.com/r/ClaudeCode/comments/1sf8gb7/claude_mythos_is_25125_per_million_tokens/, https://platform.claude.com/docs/en/about-claude/pricing.[4] I’m basing this off my estimates of Opus 4.7 usage of ~$30-50 per targeted coding task (though I have seen people use far more), then 5x-ing it for Mythos pricing.[5] Foreshadowing is a literary device in which-[6] Not a source, but a good read I recommend if you’re interested in the economics of AI: https://www.wheresyoured.at/ais-economics-dont-make-sense/[7] https://www.anthropic.com/glasswingThis is my first time writing something like this out, and I’m not even sure right now if I’ll end up publishing it. If you’re reading this that means I did, and if you liked what you read consider letting me know!

※ 著作権に配慮し、引用は冒頭3段落までです。続きは元記事をご覧ください。

元記事を読む ↗