Salesforceサイトの脆弱性をLLMで突く

2026年06月14日 #AI

AIを使用して、Recoの研究者は、完全に自動化されたエンドツーエンドのセキュリティアセスメントを実行するエージェントを構築した。

同エージェントは、LLMを使用して、Salesforce Experience Cloudサイトの攻撃面を特定し、露呈しているエンドポイントを分析し、脆弱性を特定し、作業可能なExploitを生成し、実行する。

AI技術がセキュリティ脅威の仕組みを変化させている。特に大規模なコードベースの脆弱性を自動的に特定・悪用する能力を持つLLM（大規模言語モデル）の活用が進んでおり、これまで複雑な脆弱性は手間がかかるとされてきたが、今では自動化が可能となった。

LLMを活用した攻撃の実態

セキュリティ研究チームがLLMを用いて、Salesforceのサイトを対象にした攻撃を自動化する実験を行った。この実験では、LLMを用いて攻撃フレームワークや調査ツールを構築し、攻撃プロセスそのものをAIで強化する試みが行われた。

攻撃の各段階とLLMの役割

攻撃プロセスは、情報収集、分析、悪用、検証の4段階に分けられる。LLMはそれぞれの段階で自動的に処理を行い、攻撃の効率を高めている。

攻撃の成果と課題

実際のSalesforceサイトを対象にした攻撃では、セキュリティに力を入れている企業でも高リスクの脆弱性が見つかった。LLMは攻撃コードを自動生成し、データの抽出や外部情報の取得まで行えるが、今後は攻撃の倫理や法規制の枠組みが問われる可能性がある。

まとめ

LLMの進化により、セキュリティ脅威の性質が大きく変化している。今後は攻撃の自動化が進む一方で、それに伴うリスク管理や法的対応が求められるだろう。

原文の冒頭を表示（英語・3段落のみ）

AI is changing the security landscape. More and more threat groups incorporate LLMs into their reconnaissance and exploitation workflows. The notion that some vulnerabilities are too complex to implement is now obsolete. Using LLMs, hackers can automatically find and exploit complex vulnerabilities. We have all heard of Claude Mythos and its ability to identify vulnerabilities in large codebases and exploit them automatically. But LLMs can do more than find vulnerabilities in code.‍ShinyHunters has scanned thousands of Salesforce Sites. They used a modified version of "AuraInspector". They possibly used an LLM to code their framework, mods, reconnaissance tools, and other aspects of their workflow. But the next step is to use AI to supercharge the attack process itself. We at Reco decided to explore what it would look like.‍Reco's security research team built an AI-powered agent capable of performing end-to-end security assessments of Salesforce Experience Cloud sites - fully autonomously. Give it a URL, and it discovers the attack surface, analyzes every exposed endpoint, identifies vulnerabilities, writes working exploits, and runs them. No human guidance required after providing the target.‍We pointed it at real-world Salesforce sites belonging to major technology companies, and the results were sobering. The agent discovered high-severity vulnerabilities on sites belonging to organizations that invest heavily in security. It wrote working exploit scripts from scratch, extracted real data, and even autonomously retrieved data from public sources to build payload input when needed.‍The End-to-End Security Analyst‍Our agent is not a single monolithic script. It's an agentic pipeline of AI skills, each responsible for a distinct phase of the assessment. A human researcher would follow a similar workflow: reconnaissance, analysis, exploitation, validation. The difference is that every phase is executed by an LLM that can reason about what it sees, adapt its approach, and make judgment calls without human intervention. The LLM controls everything, and it can choose how to use each skill. There is a generic workflow, phases 1-5, but the LLM may choose to go back and rerun skills with a different context as it sees fit.‍Phase 1: Reconnaissance - Mapping the Attack Surface‍The agent starts with nothing but a URL. Its first task is to discover everything the site exposes to unauthenticated visitors.‍It queries the Salesforce Aura framework to enumerate all accessible Salesforce objects (database tables), Apex controller methods (server-side business logic), routes (public pages), and content (files and documents). The output is a comprehensive site context - a map of the entire attack surface, including method signatures, parameter names and types, and object schemas. This is mostly based on deterministic steps, which you can read about in our pentesting guide. The LLM can invoke a Python tool that deterministically enumerates the objects and the apex methods, including their parameters. The LLM may choose to provide additional context as well if it deems it necessary, such as information about the company.‍Phase 2: Object Analysis - Probing for Data Exposure‍With the attack surface mapped, the agent shifts to data analysis. It categorizes every discovered Salesforce object by sensitivity: what tables may contain sensitive data. It prioritizes tables such as "Contact", "Lead", or "CreditCardTransaction__c" over "BlogPostEntry__c".‍For each high-interest object, the agent inspects the schema (field names, types, custom fields), then attempts to query records as a guest user. It evaluates what comes back - not just whether records are accessible, but whether the data in those records is sensitive.‍For file objects specifically, the agent follows a dedicated workflow: enumerate accessible documents, inspect metadata, download and read the content of each file, and assess whether it contains sensitive or confidential information. This is how it finds the one sensitive file buried among hundreds of mundane ones.‍Phase 3: Apex Fuzzing‍Here the LLM is a game-changer. For every exposed Apex method, the agent tries to test it. It analyzes the signature and builds different baselines to run the method.‍1. Infers valid values: Using method signatures, parameter names, client-side JavaScript analysis, and data from previously dumped objects, the agent reasons about what inputs each method expects. If a method takes a caseId, the agent looks for Case records it already discovered. If it takes an emailAddress, it tries common patterns.‍2. Invokes the method and analyzes the response: The agent calls every safe-to-test method (no delete/modify ones), reads the response, and evaluates whether it returns data that guest users shouldn't have access to, including PII, internal records, credentials, and configuration details.‍3. Probes for SOQL injection: For every method with potentially injectable parameters (strings or complex types), the agent tries to inject a single quote, then tautologies, then wildcard payloads. It compares responses against the baseline to detect behavioral changes that indicate string concatenation in SOQL queries. When it finds a difference - an error message, a changed result set, a shifted count - it confirms the injection and characterizes the oracle.‍The output is a detailed analysis report covering every tested method, with confirmed findings classified by type and severity.‍Phase 4: Exploitation - Writing and Running Exploits‍When Phase 3 confirms an exploitable injection, the agent doesn't just report it - it writes a working proof-of-concept exploit.‍The agent generates standalone Python scripts that implement the full exploitation chain. For blind SOQL injection, this means:Constructing subqueries that pivot from the vulnerable object to high-value targets like User, Contact, or LeadImplementing character-by-character extraction using LIKE prefix matchingOptimizing with frequency-ordered character sets to minimize HTTP requests‍The agent then runs the exploit, validates that data is actually extracted, and documents exactly what was obtained. If it successfully extracts an employee email from the User table, it doesn't stop there - it probes additional fields (phone numbers, titles), additional pivot relationships (OwnerId, CreatedById), and additional target objects (Contact, Lead).‍Phase 5: Severity Review‍The final phase is important for security assessors. The agent reviews its own findings from the perspective of a skeptical program manager. Threat groups might skip it, though it could be beneficial if they want to focus on targets with high-severity vulnerabilities.‍For each vulnerability, it asks:What data was actually extracted?Does the severity reflect demonstrated impact or just the vulnerability class?‍This adversarial self-review catches severity inflation, distinguishes real PII exposure from metadata leaks, and ensures the final report is defensible. Of course, this is not the end - the agent may go back to phase 4 to try to find a stronger exploit.‍Running the Agent‍After testing it in our own labs, we targeted several companies with vulnerability disclosure programs that allowed it. We constrained the agent explicitly: no write, delete, or modify operations; no bulk extraction; testing limited to methods that could not cause side effects. The goal was demonstrating exploitability and impact, not maximizing data exfiltration.‍This security assessor agent was built to work as a pentester, and not a malicious attacker - but the results are the same; we still got data dumps, including PII. The main difference is the size of the dump.‍Case Studies‍Disclosure note: All vulnerabilities described in this post are real and were responsibly disclosed to the affected organizations through their security programs. Company names have been replaced with fictional names to protect the organizations while preserving the technical accuracy of the findings.‍Case Study 1: Aegis Security - "Tell Me Everything About This Email"‍Company profile: Aegis Security (name changed) is a major cybersecurity vendor with a partner portal built on Salesforce Experience Cloud. The portal allows technology and channel partners to manage their relationship with the company. The site was not meant to be publicly accessible - "Guest users can see and interact with the site without logging in" is disabled.‍What the Agent Found‍The agent began by mapping the full attack surface: 263 Salesforce objects, 55 Apex methods across 9 controller classes, and 10 public routes. It systematically analyzed each endpoint, probing parameters with test inputs to understand the application's behavior.‍When the agent reached PartnerPortalOnboardingController.getContactInfo, it discovered something alarming. This Apex method, intended for the partner onboarding flow, accepted a single parameter: an email address. And it was accessible to unauthenticated guest users.‍The agent tested it with a generic email and received back:‍{

"FirstName": "...",

"LastName": "...",

※ 著作権に配慮し、引用は冒頭3段落までです。続きは元記事をご覧ください。

— 元記事を読む ↗

元記事を読む ↗