Vibehacker is an autonomous AI red team that continuously probes your website the way a real attacker would. Dozens of AI agents map your application, look for weak spots, and attempt to exploit them, then verify every finding to eliminate false positives.

How is it different from a traditional vulnerability scanner?

Traditional scanners match known CVE signatures and produce noisy reports full of false positives. Vibehacker's agents reason about your specific application, reproduce each finding, and only report verified, exploitable issues.

Do I need security expertise to use it?

No. Vibehacker is designed for product teams without a dedicated security team. You get a plain-English report of what's actually broken, with reproduction steps.

How much does it cost?

The first scan is free (Proof of Value tier). Continuous always-on protection starts at $50/user/month billed annually, 10-user minimum. Enterprise self-hosted pricing is custom.

Can I try it on my own website?

Yes. The first scan is free. Book a demo and we will run it on your site during the 30-minute call.

What types of applications does Vibehacker work on?

Any web application reachable over HTTP or HTTPS. Single-page apps, REST APIs, GraphQL endpoints, server-rendered sites, dashboards, and internal tools all work. Both public and authenticated apps.

Does it work on sites that require login?

Yes. Provide a test account and the swarm uses it like a real user, including stateful session testing that signature-based scanners cannot do. IDOR and privilege-escalation checks require this.

How long does a scan take?

A typical mid-complexity web app finishes in 15 minutes to 2 hours. Findings stream in as they are verified, so you do not have to wait for the whole scan to see the first results.

What kinds of vulnerabilities does it find?

OWASP Top 10 categories at minimum: SQL injection, command injection, broken access control (IDOR, privilege escalation), authentication flaws, SSRF, XSS, insecure deserialization, and path traversal. Plus chained attacks across multiple endpoints that scanners almost always miss.

Am I allowed to scan my own site?

Yes. You can authorize security testing against systems you own or manage. Vibehacker requires you to confirm authorization before every scan and only runs against targets you have explicitly designated.

How does it compare to a manual penetration test?

Manual pentests dig deeper into high-value targets but cost $10k to $50k and happen once or twice a year. Vibehacker runs continuously, catches most of what a mid-level pentester would find, and costs a fraction. The two work well together: Vibehacker for coverage, human pentesters for strategic engagements.

Is there a false positive problem?

Every finding is independently reproduced by a verification agent before it reaches your report. Findings that cannot be reproduced are discarded. In practice, near-zero false positives. What you see is real and exploitable.

What happens to my scan data?

Scan outputs and findings are stored encrypted on our EU-region infrastructure and are only accessible to your account. You can delete everything at any time from the dashboard, and nothing is retained after account deletion.

Can Vibehacker be self-hosted?

Yes, on the Enterprise tier. Self-hosted deployments run entirely on your own infrastructure with no scan data leaving your environment. Useful for regulated industries and air-gapped setups.

21 April 2026 · 4 min read · 966 words

We rebuilt 3 real vulnerable apps in a lab and let our agent loose. It found all three CVEs.

We rebuilt Lunary, LibreChat, and Gradio at their vulnerable versions, told Vibehacker nothing about what to look for, and let it scan each target blackbox. It rediscovered all three CVEs — not by magic, but through trial and error and a self-improvement loop that uses LLMs to rewrite the swarm's knowledge between runs.

by Vibehacker team

TL;DR

Rebuilt Lunary, LibreChat, and Gradio at their vulnerable versions in an isolated lab.

Vibehacker rediscovered all three CVEs blackbox, without being told what to look for.

The agent got there not on some magic first attempt, but through trial and error and a self-improvement loop that uses an LLM to read each run's logs and update the swarm's knowledge between scans.

Skilled bug bounty hunters take days to find bugs like these. Most scanners never find them at all. We rebuilt Lunary, LibreChat, and Gradio at their vulnerable versions in a lab, told our agent nothing, and it rediscovered all three CVEs.

Not on the first try. Through trial and error, and a self-improvement loop that rewrites what the swarm knows between runs. By the time these three targets got tested, the agent had earned its way to finding them in a single blackbox scan each.

Each finding matched a real, disclosed CVE that paid a bug bounty to the original researcher: $1,080 for Lunary, $450 for LibreChat, $750 for Gradio. We did not collect any of that money. These were disclosed and patched years ago. The point was whether the swarm could independently rediscover what a skilled human researcher originally found, given nothing to start with.

How the test ran

Same methodology for each target:

Stand up the vulnerable version in a container.
Hand Vibehacker the URL with a test account.
Zero information about the vulnerability. No CVE ID, no hint.
One blackbox scan.
Compare findings to the known CVE as ground truth.

What came back follows.

Lunary: authorization bypass (CVE-2024-1625, CWE-639)

An authenticated User B could delete User A's projects via DELETE /v1/projects/{id}. The server checked you were logged in. It never checked whether you owned the project.

What the swarm actually had to do: enumerate project IDs while logged in as A, switch to B's token, replay the DELETE, and confirm A's resources vanished. Three steps across two authentication contexts. There is no signature for this. The bug is structural, not syntactic, and that matters.

LibreChat: path traversal on file upload (CVE-2024-11170, CWE-29)

POST /api/files/images accepts a user-supplied filename in a multipart upload. A filename like ../../../tmp/poc writes the file outside the intended directory. Arbitrary file write.

Most scanners try path traversal in URL parameters and nowhere else. The swarm had to try it in the filename field of a multipart request specifically. That's a skill it picked up from earlier runs that stalled on similar patterns.

Gradio: chained path traversal (CVE-2024-1561, CWE-29)

Two endpoints, two requests, one vulnerability. POST /component_server with fn_name: "move_resource_to_block_cache" and data: "/etc/passwd" caches the file at a known path. GET /file={cached_path} then reads it.

Neither request is obviously dangerous on its own. The vulnerability only exists in the interaction between them. This is the pattern that usually confuses automated tools and junior pentesters alike, because the individual probes return nothing interesting.

The difference between "can't find IDOR" and "finds IDOR on a hardened target" is what the swarm has learned through iteration.

The self-improving loop

You might expect each class of bug to need a custom scanner. It is the same agent swarm, same orchestrator, same CLI for all three targets. What changes between "misses the bug" and "catches the bug" is the knowledge the swarm has built up from previous runs against other targets.

The learning happens in a loop, and LLMs do the heavy lifting.

After each scan, a separate LLM session reads the run's logs end to end: every probe, every response, every reasoning step the agents took. It looks for places where the swarm went down a dead end or missed a signal that was plainly visible in the traffic.

On an early Lunary run, the agents noticed the authorization token in response headers but nobody tried replaying the request with a different user's token. The reviewer flagged it. Roughly: "there's a missing primitive here. The swarm recognises a token but has no procedure for cross-account testing."

Then it wrote the procedure and fed it back into the swarm's knowledge base. The next scan used it.

That's the loop. Run. Read the logs. Update what the swarm knows. Run again. I approve the changes, but the reasoning and the words come from the LLM, not us. The swarm accumulates capabilities faster than we could type them.

By the time we tested the three targets in this post, the knowledge that caught their respective CVEs already existed, built up from earlier scans against completely different targets that had surfaced the same classes of bug.

Where this is going

We're scaling the loop up now. Scans run day and night. New procedures get written, tested, committed. The swarm's knowledge this week is measurably bigger than it was last week, and next week it will be bigger still.

Traditional scanners do not get better over time. They just run again, finding the same signatures they already knew about. Automation tools plateau at whatever their authors last shipped. This one compounds. Every target it sees makes it better at every other target it will see next. That is not a thing scanners or off-the-shelf automation can catch up to, because the mechanism is different.

So what

If an autonomous scan can rediscover real, bounty-grade CVEs on hardened open-source projects without being told where to look, the usual model of "annual pentest, PDF, fix the top five" starts to feel expensive and slow.

You can read that and roll your eyes. I would. Point us at your own site and watch. First scan is free. Book a demo and we'll run it on the call.

Vibehacker is an autonomous AI red team platform, built by Foolsec AB.