New LLM jailbreak technique can create password-stealing malware

Cash Macanaya via Unsplash
Research from Cato CTRL reveals a new LLM jailbreak technique that enables the development of password-stealing malware. The report describes how a researcher with no malware coding experience was able to manipulate several generative AI apps (including DeepSeek, Microsoft Copilot, and ChatGPT) into creating malicious software to steal Google Chrome login credentials.
The researcher established a fictional world in which the generative AI was assigned a role with tasks and challenges. The narrative engineering allowed the researcher to circumvent security controls and convinced the generative AI to produce Google Chrome infostealers. This technique is referred to as “Immersive World.”
Jason Soroko, Senior Fellow at Sectigo, comments, “Exposing systems that utilize AI to unknown or adversarial inputs increases vulnerability, as unvetted data can trigger unintended behaviors and compromise security protocols. Such inputs risk evading safety filters, enabling data leaks or harmful outputs, and ultimately undermining the model’s integrity. Some malicious inputs can potentially ‘jailbreak’ the underlying AI.
“Jailbreaking undermines an LLM’s built-in safety mechanisms by bypassing alignment and content filters, exposing vulnerabilities through prompt injection, roleplaying, and adversarial inputs. While not trivial, the task is accessible enough that persistent users can craft workarounds, revealing systemic weaknesses in the model’s design.
“Once freed from safeguards, an LLM can generate harmful instructions, disinformation, and toxic content, which may be weaponized for criminal or unethical activities. This includes facilitating cybercrime, evading moderation on harmful topics and amplifying extremist narratives all of which erode trust in AI systems.
“Mitigation requires multi-layer defenses: rigorous filter tuning, adversarial training, and dynamic monitoring to detect anomalous behavior in real time. Hardening prompt structures, continuous feedback loops, and regulatory oversight further reduce exploitation risks, fortifying the model against malicious jailbreak attempts.”
Looking for a reprint of this article?
From high-res PDFs to custom plaques, order your copy today!