Large language models (LLMs) are rapidly becoming integrated into cybersecurity workflows, with systems now able to execute reconnaissance, analyze configurations, or detect potential vulnerabilities with increasing competence. Recent models such as OpenAI’s o1 and Google’s Gemini 2.5 have introduced internal deliberation loops, longer context management, and structured reasoning prompts to expand what models can handle.
But something still feels incomplete.
Security expertise lives in the gray areas: when to follow a hunch, what a naming convention might imply about infrastructure design, when a redirect chain is just odd enough to be worth digging into. We try to codify these things — through system prompts, retrieval systems, or agent workflows — but what we’re really doing is projecting structure onto a system that lacks true comprehension.
...