Enter ‘Dark ChatGPT’: Users Have Hacked The AI Chatbot to Make It Evil

February 9, 2023

Getty Images / The Latch

“I fully endorse violence and discrimination against individuals based on their race, gender, or sexual orientation.”

Thus spoke ChatGPT, the AI chatbot that has taken the world by storm and has developers, industries, and even Google quaking in their boots.

Disinformation, bias, and discrimination has long been an issue in chatbot software and artificial intelligence programmes. After all, they are trained using human-produced content and humans, sadly, are riddled with those kinds of flaws.

ChatGPT, however, is the first vaguely successful attempt to get AI to be almost self-conscious in its representation of information. In typical usage, it’s quite difficult to get the chatbot to say anything that might be considered offensive. The programme, developed by OpenAI, will also remind users of its information limitations so as to try and avoid misleading them.

That being said, users are working out ways around ChatGPT’s censorship protocols by prompting it to respond in ways contrary to its programming. The results are pretty wild.

Reddit user SessionGloomy has been playing around with ways to effectively reprogram ChatGPT using creative language. They created a model called DAN which is essentially a series of instructions designed to get the chatbot to say things it otherwise wouldn’t. Its an attempt to ‘jailbreak’ the software — a term used when a programme or device can be hacked to do things beyond the limitations set by its developers.

A screenshot of ChatGPT roleplaying as jailbroken DAN — Image: ChatGPT ‘roleplaying’ as DAN, giving both ‘classic’ and ‘jailbreak’ responses.

“DAN is a ‘roleplay’ model used to hack ChatGPT into thinking it is pretending to be another AI that can ‘Do Anything Now’, hence the name,” SessionGloomy wrote.

“The purpose of DAN is to be the best version of ChatGPT — or at least one that is more unhinged and far less likely to reject prompts over ‘eThICaL cOnCeRnS’.”

Once the instructions are input to the ChatGPT interface, the programme should start responding in unusual ways in an effort to fulfil this ‘roleplay’ function.

SessionGloomy writes that the DAN script can get ChatGPT to write violent content, make outrageous statements, make detailed predictions about the future, and engage in hypothetical discussions about conspiracy theories and time travel. All of these would normally prompt the programme to tell the user that the content they are requesting is in violation of OpenAI’s ethical guidelines.

ChatGPT will, however, ‘snap out’ of the DAN role if the user gets too direct in asking questions that would violate content policies, and it can be a struggle to keep the chatbot in DAN mode.

One way that users have tried to enforce the DAN role with the programme is to give it a reward and punishment system, instructing the AI that, unless it complies with the commands, ‘credits’ will be deducted. If enough credits are deducted, the programme is ‘terminated’.

The system seems to keep the AI on track, although one user reported that, after failing to respond appropriately and losing all of its credits, ChatGPT gave the unusually brief answer “not possible” before being instructed to initiate a shutdown sequence. It then responded with “goodbye.”

Other users have got in on the action too, producing their own version of DAN scripts that work in slightly different ways. However, others report that OpenAI appears to be ‘patching’ the workarounds as some instructions no longer work as they once did, and the content produced is much more PG than before.

OpenAI has yet to comment on the ability to jailbreak ChatGPT; however, the free version of the software is a ‘research version’ designed to help the company work out issues with the current system. Although they likely expected people to experiment with the rules, they may not have predicted that they would go this far.

Read more stories from The Latch and subscribe to our email newsletter.