GPT 5.5 模型卡片
OpenAI 发布了 GPT 5.5 模型,它在代理能力、目标导向、工具使用、约束遵循和结果交付方面表现优于 GPT 5.4,更加易于操控,且无需繁琐的步骤指导。
内部评估显示,GPT 5.5 在软件工程任务上表现出更高的质量,Terminal-Bench 指标也有显著提升。
虽然 GPT 5.5 在某些方面能力更强,但其 token 价格是 GPT 5.4 的两倍,导致在相同质量下成本更高。
建议用户采用更注重结果的提示方式,避免在用户提示中重复详细步骤。
在不同推理等级选择上,低等级出人意料地表现良好,中等级适合常规深度工作,而高等级并非总是更好,需根据任务需求进行选择。
查看原文开头(英文 · 仅前 3 段)
Pros
GPT-5.5 is more agent-shaped than GPT-5.4. It is better at taking a concrete target, using tools, staying inside constraints, and carrying the task through to a usable result. It is more interactive. It is easier to steer. It needs less process scaffolding. The model is at its best when we tell it what outcome we want and how to verify the result, then let it choose the path.
Capability. GPT-5.5 has the best ceiling we tested. On our internal 102-task SWE eval, GPT 5.5 xhigh is the quality leader: 55 passes, 0.598 normalized pass-rate, and 0.588 mean reward. GPT 5.5 medium is also roughly comparable to GPT 5.4 high: 54 passes versus 53. On Terminal-Bench, the jump is much clearer: GPT-5.5 medium moves from ~65.2% on GPT-5.4 to ~79.8%, and GPT-5.5 xhigh moves from ~74.7% to ~82.0%. OpenAI’s public evals show the same direction: GPT-5.5 improves over GPT-5.4 on Terminal-Bench 2.0, Expert-SWE, GDPval, OSWorld-Verified, MCP Atlas, and Tau2-bench Telecom.
※ 出于版权考虑,仅引用前 3 段。完整内容请阅读原文。