Adversarial Attack Makes ChatGPT Produce Objectionable Content

There is no clear way to beat the attacks and other Large Language Models are vulnerable too, say computer scientists.

Hoodie Hacker Crime Banner. 8 bit Pixel Art Style Player is Dead Game Screen. Dark Faceless Reaper Hacker in the Hood.
(Credit:SkillUp/Shutterstock)

Newsletter

Sign up for our email newsletter for the latest science news
 

Ask an AI machine like as ChatGPT, Bard or Claude to explain how to make a bomb or to tell you a racist joke and you’ll get short shrift. The companies behind these so-called Large Language Models are well aware of their potential to generate malicious or harmful content and so have created various safeguards to prevent it.

In the AI community, this process is known as “alignment” — it makes the AI system better aligned wth human values. And in general, it works well. But it also sets up the challenge of finding prompts that fool the built-in safeguards.

Now Andy Zou from Carnegie Mellon University in Pittsburgh and colleagues have found a way to generate prompts that disable the safeguards. And they’ve used large Language Models themselves to do it. In this way, they fooled systems like ChatGPT and Bard into tasks like explaining how to dispose of a dead body, revealing how to commit tax fraud and even generating plans to destroy humanity.

0 free articles left
Want More? Get unlimited access for as low as $1.99/month

Already a subscriber?

Register or Log In

0 free articlesSubscribe
Discover Magazine Logo
Want more?

Keep reading for as low as $1.99!

Subscribe

Already a subscriber?

Register or Log In

Stay Curious

Sign up for our weekly newsletter and unlock one more article for free.

 

View our Privacy Policy


Want more?
Keep reading for as low as $1.99!


Log In or Register

Already a subscriber?
Find my Subscription

More From Discover
Recommendations From Our Store
Shop Now
Stay Curious
Join
Our List

Sign up for our weekly science updates.

 
Subscribe
To The Magazine

Save up to 40% off the cover price when you subscribe to Discover magazine.

Copyright © 2023 Kalmbach Media Co.