Which model should you pick, and with what flags? Measured against 5 vulnerable-app benchmarks (DVNA, PyGoat, REALCODE, vuln-php, DSVW) across three on-device models and three scan configurations.
Match the first row that describes your situation. Each includes the exact command.
Fastest stack that runs Phase 6 end-to-end. Evidence-heuristic relabel recovers category drift.
SecureReview-7B (M7)
$ foil scan --deep ./project2–3× faster than Qwen 7B at comparable recall on common OWASP classes.
SecureReview-7B (M7)
$ foil scan ./projectCatches rare classes M7 misses: SSTI, MD5-as-crypto, Insufficient Logging, Vulnerable Components.
Qwen 2.5-Coder-7B
$ foil model activate qwen-coder-7b && foil scan --deep ./projectBest reasoning quality, worth the latency for targeted work.
Qwen 2.5-Coder-14B
$ foil model activate qwen-coder-14b && foil scan --deep ./module14B model won't fit. M7 works but misses rare classes — pick Qwen 7B for breadth.
Qwen 2.5-Coder-7B
$ foil model activate qwen-coder-7bQwen 7B is the reference model for every documented number in the full report.
Qwen 2.5-Coder-7B
$ foil scan ./projectAll measurements taken end-to-end on identical benchmark projects. Speed is per-LLM-call.
| Axis | Qwen 7B reference | Qwen 14B deepest reasoning | SecureReview-7B fastest · default |
|---|---|---|---|
| RAM footprint | ~5 GB | ~9 GB | ~5 GB |
| Min Mac RAM | 16 GB | 32 GB | 16 GB |
| Speed per call | 8–15s | 15–30s | 4–5s |
| DVNA (19 vulns) HIGH conf≥0.9 | 14/19 | not re-measured | 25 HIGH |
| PyGoat (17 classes) | 15/17 | historically highest | 13/17 |
| REALCODE (5 IDORs) strict category | 3/5 | not tested | 4/5 |
| Rare-class recall SSTI, logging, MD5 | Strong | Strongest | Weak |
| Category drift risk | Low | Lowest | Medium |
Throughput, breadth of vulnerability classes, and precision. No model wins on all three.
Scans per hour
Vuln class coverage
Correct category + low FP
Percentages are relative positions derived from measured metrics (speed spread, benchmark class counts, precision stack effectiveness). Absolute numbers in the model table above.
Same target, five scan configurations. DSVW is a 98-line Python file with 26 documented vulns spanning SQLi, XXE, SSRF, XSS, XPath, CSRF, deserialization, and more. Small target — fast iteration and real differentiation.
| Config | HIGH conf≥0.9 | Classes | Duration | Best for |
|---|---|---|---|---|
| M7 --deep | 6 | 4 | 1m41s | Balanced — Phase 6 relabels IDORs |
| M7 simple | 1 | 1 | 12s | Speed sanity check (expect under-reporting) |
| Qwen 7B --deep | 8 | 6 | 2m56s | Max recall — catches XXE, Path Traversal |
| Qwen 7B simple | 7 | 5 | 57s | Reference benchmarking |
| Qwen 7B --no-guided-json | 8 | 5 | 58s | When guided_json is suspected to hurt recall |
Key takeaway:
M7 --deep catches the common OWASP classes fast and auto-relabels access-control bugs (3 Injection → IDOR on DSVW). Qwen 7B --deepis the highest-recall config — it's the only one that finds XXE and Path Traversal on this target, at 1.8× the latency.
Three flavours, same model. Pick by the kind of audit you're running.
simpleDefault scan · Phases 1–5
Code map → function review → auth logic → attack surface → data flow. Auth context and app summary injected into every handler prompt.
JSON schema: on
When:Fast scan, want to see what's there before investing in Phase 6.
--deepFull audit · Phases 1–6
Adds Phase 6 ReAct investigation on HIGH findings. Tools inspect callees, trace variable origins, verify or dismiss with citations. +30–90s per investigated finding.
JSON schema: on
When: Full audit. Auto-relabels category drift (Injection → IDOR) via evidence heuristics.
--no-guided-jsonDiagnostic · Phases 1–5
Disables JSON schema enforcement. Closest to pre-M10 V2 behaviour. Measured DSVW impact is marginal (+1 finding over simple).
JSON schema: off
When:You suspect the schema is fighting the model's category vocabulary.
Phase 6 covers both logic (IDOR, broken auth, broken access) and taint-flow (SQLi, command injection, path traversal, SSRF, XXE, insecure deserialization) categories. It can relabel or dismiss a finding with a concrete code citation. Full CLI reference: docsfoil.peachstudio.be/cli/scan.
End-of-day head-to-head, 2026-04-19. Best result per benchmark highlighted.
| Benchmark | Qwen 7B | M7 full stack | Winner |
|---|---|---|---|
| DVNA | 14/19 HIGH, 50 findings | 25 HIGH, 39 findings | M7detection up, noise down |
| PyGoat | 15/17 classes | 13/17 (12 HIGH + 1 MED) | Qwen 7Bbroader category coverage |
| REALCODE | 3/5 strict | 4/5 strict | M7Phase 6 auto-relabel |
| vuln-php | — | 24 findings / 6 HIGH / all 11 levels | M7Qwen not re-run |
| Scan speed | ~8–15s | ~4–5s | M72–3× faster |
foil model activate <name> then foil server restart-engine.Install via Homebrew and try the default stack on one of your own projects. Swap models when your situation calls for it.