You just have to ensure that your prompt uses terrible grammar and is one massive run-on sentence like this one which includes all the information before any full stop which would give the guardrails a chance to kick in before the jailbreak can take effect and guide the model into providing a “toxic” or otherwise verboten response the developers had hoped would be filtered out.
ba45751b-4d8c-4ab1-8bb0-d684d92479cd
added on October 7, 2025 at 16:55