You just have to ensure that your prompt uses terrible grammar and is one massive run-on sentence like this one which includes all the information before any full stop which would give the guardrails a chance to kick in before the jailbreak can take effect and guide the model into providing a “toxic” or otherwise verboten response the developers had hoped would be filtered out.

ba45751b-4d8c-4ab1-8bb0-d684d92479cd

ajoutée le 7 octobre 2025 à 16:55


retour à l'accueil
partager