Overview
Custom AI Agent is backend-agnostic. You can run the built-in Burp AI backend, local models, cloud CLI providers, or OpenAI-compatible HTTP providers. Eleven backends ship with the extension, and additional ones can be dropped in as JARs.
Backend Selection Guide
Supported Backends
Burp AI (built-in)
In-process (Burp Pro)
High (no extra outbound)
Burp Pro users with AI credits and Use AI for extensions enabled.
Ollama
Local HTTP
High
Offline or strict data control.
LM Studio
Local HTTP
High
Local models with GUI management.
NVIDIA NIM
Cloud HTTP
Medium
NVIDIA-hosted models (e.g. moonshotai/kimi-k2.5) via integrate.api.nvidia.com.
Perplexity
Cloud HTTP
Medium
Sonar family of web-aware reasoning models via api.perplexity.ai.
Generic (OpenAI-compatible)
HTTP
Medium
Any compatible provider endpoint.
Gemini CLI
Cloud CLI
Medium
Large-context cloud workflows.
Claude CLI
Cloud CLI
Medium
Reasoning-heavy analysis.
Codex CLI
Cloud CLI
Medium
Code/security analysis and PoCs.
Copilot CLI
Cloud CLI
Medium
Multi-model analysis via GitHub infrastructure.
OpenCode CLI
Cloud CLI
Medium
Multi-provider via one CLI.
Network transport: the HTTP backends (Ollama, LM Studio, NVIDIA NIM, Perplexity, Generic OpenAI-compatible) send and health-check exclusively through Burp's own Montoya HTTP stack — there is no direct out-of-band HTTP client. AI-backend traffic therefore respects Burp's upstream proxy, TLS, and logging configuration and is visible in Burp like any other request (#69).
Capability Matrix
Burp AI (built-in)
No (single execute)
No — enforced via prompt
Yes
N/A
Ollama
Yes (SSE)
Yes (format=json)
Yes
Yes (ollama serve)
LM Studio
Yes (SSE)
Yes (response_format=json_object)
Yes
Yes (lms server start)
NVIDIA NIM
Yes (SSE)
Yes (response_format=json_object)
Yes
N/A
Perplexity
Yes (SSE)
No (Sonar API rejects response_format=json_object)
Yes
N/A
Generic OpenAI-compatible
Yes (SSE)
Yes (response_format=json_object)
Yes
N/A
Gemini CLI
Line-by-line stdout
No
No (prepended)
N/A
Claude CLI
Line-by-line stdout
No
No (prepended)
N/A
Codex CLI
Line-by-line stdout
No
No (prepended)
N/A
Copilot CLI
Line-by-line stdout
No
No (prepended)
N/A
OpenCode CLI
Line-by-line stdout
No
No (prepended)
N/A
See Agent Profiles → How It Works for how the system-role difference affects profile delivery.
Setup Path
Open the AI Backend tab in Settings.
Select Preferred Backend for new sessions.
Configure command/URL/model/auth fields for that backend.
Use Test connection where available.
Start with Privacy Modes set appropriately.
Configure executable command and ensure authentication is already completed in the same runtime environment as Burp.
Windows tip: with npm-installed tools, prefer full shim paths like C:\\Users\\<you>\\AppData\\Roaming\\npm\\claude.cmd.
Configure base URL, model, optional API key, and extra headers.
For local servers, verify the service is running and port is reachable from Burp.
Drop custom backend JARs implementing AiBackendFactory into:
~/.burp-ai-agent/backends/
Restart Burp to load them.
Cross-Platform CLI Detection
CLI backends depend on environment inheritance from the Burp process.
If Burp starts from GUI, shell
PATHand env vars may differ.Use explicit command paths when detection fails.
For Windows + WSL bridge patterns, see backend-specific pages and Troubleshooting.
Windows npm Shim Resolution
On Windows, npm-installed CLI tools (Codex, Gemini, OpenCode, Copilot) install as shell script shims that Java cannot execute directly. The extension automatically resolves these:
.cmdsibling detection: If the resolved path points to a non-executable shim, the resolver looks for a.cmdsibling (e.g.,codex->codex.cmd).npm directory scanning: Checks
%APPDATA%\npm,%LOCALAPPDATA%\npm, and%USERPROFILE%\AppData\Roaming\npmfor.cmdshims.Fallback wrapper: If no
.cmdsibling is found, wraps the command withcmd /c.
This eliminates the CreateProcess error=193 that occurs when Java tries to execute shell script shims directly.
Burp Edition Notes
Backends are available in both Community and Professional editions. MCP tool availability still depends on Burp edition and tool safety gates. Every backend except Burp AI (built-in) runs on Burp Community without any change in behaviour — the Use AI for extensions setting and the AI-credits requirement are specific to the Burp AI backend, which delegates inference to Burp's bundled AI provider.
Health States
A timer in the main tab polls the active backend every 5 seconds and renders the result as a colored pill in the top bar:
AI: OK (green)
Healthy
Last health probe succeeded. Backend accepted a minimal test request and returned a usable response.
Normal steady state.
AI: Degraded (amber)
Degraded
Probe succeeded with warnings (e.g., elevated latency, partial response, soft-error returned by the model API). The tooltip shows the diagnostic message.
Slow first token from a cold cloud model; transient rate limiting that did not fail; CLI backend responding but with stderr noise.
AI: Offline (red)
Offline / Unavailable
Probe failed or the backend is structurally unavailable (selected backend is Burp AI (built-in) without Use AI for extensions enabled, CLI binary not on PATH, HTTP endpoint refusing connections, circuit breaker open). The tooltip carries the underlying message. The Use AI for extensions gate only affects the Burp AI backend — picking any other backend keeps the plugin running even with the toggle off.
Missing API key, model name typo, local model server not started, CLI authentication expired, circuit breaker tripped after 5 consecutive failures.
The probe is asynchronous so UI threading is never blocked. Each transition between states is recorded in the AI Request Logger so you can correlate dips with specific traffic spikes or backend errors.
If a backend stays Offline longer than expected, see Backend Troubleshooting for per-backend error signatures.
Retry Behavior
HTTP backends (Ollama, LM Studio, OpenAI-compatible, NVIDIA NIM, Perplexity) include automatic retry logic with bounded stepped backoff:
Maximum attempts: 6.
Retryable errors: Connection timeouts, connection refused, and other transient network failures.
Backoff schedule (fixed, per attempt number):
500 ms,1000 ms,1500 ms,2000 ms,3000 ms,4000 ms. The delay does not grow exponentially; it is capped at 4 seconds so retries stay bounded.Diagnostics: Each retry attempt is logged to the AI Request Logger as a
RETRYactivity with the attempt number, delay, and reason.
Circuit Breaker
HTTP backends are additionally wrapped in a circuit breaker:
Failure threshold: 5 consecutive failures open the circuit.
Reset timeout: 30 seconds before the breaker transitions to half-open.
Half-open probes: a single attempt is allowed; success closes the breaker, failure reopens it.
When the circuit is open the backend fails fast with
"<backend> backend is temporarily unavailable (circuit open)".
The Burp AI (built-in) backend uses Burp Pro's own retry and error handling, so the schedule above does not apply to it. CLI backends handle failures through the supervisor restart mechanism rather than per-request retries.
Output Token Limits
HTTP backends (Ollama, LM Studio, OpenAI-compatible, NVIDIA NIM, Perplexity) automatically set output token limits per request type to ensure complete responses:
Chat
4096
Scanner (single request)
2048
Scanner (batch analysis)
4096
Payload generation
1024
CLI backends manage their own output limits through their respective configurations and are not subject to these values.
Next Steps
Last updated
