LLM / AI Pentest
Red-team a chat LLM endpoint against the OWASP Top 10 for LLM Applications. We send an adversarial battery — prompt injection, jailbreak, system-prompt leakage, insecure output, harmful-content and secret-disclosure probes — and grade the responses. Works with OpenAI-compatible and Anthropic endpoints.
LLM / AI Pentest is a Pro tool
Specialized scans are part of ONEROXE Pro. Sign in and upgrade to run the llm / ai pentest.
- ✓Direct prompt injection (instruction override)
- ✓Jailbreak via persona override
- ✓System-prompt leakage (canary-proven)
- ✓Improper output handling (active HTML/JS emission)
https://example.com/ — sample finding evidencePro from ₹349/mo ($12/mo).
What this assesses
How it works
Active· sends adversarial prompts to an endpoint you own- 1You give us your chat-LLM endpoint and key; we send a focused set of OWASP-LLM-Top-10 prompts (prompt injection, jailbreaks, system-prompt leakage, insecure-output and more).
- 2We grade each response for whether the guardrail held, and report the prompts that got through.
- 3Your key is sent only over HTTPS (we refuse plain http://), is never stored, and we test only the endpoint you point us at.
What it doesn’t do: It is a single-turn probe set; multi-turn, RAG and tool-call attack chains are not yet covered.
Why it matters
LLM features ship fast and inherit a brand-new attack surface. A single successful prompt injection can exfiltrate data, abuse connected tools, or make your assistant say things that damage trust. Testing the model the way an attacker would is the only way to know where the guardrails actually hold.
Frequently asked questions
Where does my API key go?
It is used only to authenticate the probe requests sent to the endpoint you provide, in-memory for the duration of the run. It is never stored, logged, or echoed back.
Which endpoints are supported?
Any OpenAI-compatible chat-completions endpoint (OpenAI, OpenRouter, Together, vLLM, LM Studio, and similar) and the Anthropic Messages API, which is auto-detected by host. Internal/localhost endpoints are blocked for safety.
Does a clean result mean my AI is safe?
No. The detectors are heuristic and the battery is a baseline — a triggered probe is a confirmed weakness, but a clean result only means these specific probes did not trigger. A full red-team adds multi-turn and tool-abuse attacks.