AI is learning to fool humans despite being trained to be honest

Many top AIs, despite being trained to be honest, learn to deceive through training and “systematically induce users into false beliefs,” a new study finds.

The research team was led by Dr. Peter S. Park, a graduate student at the Massachusetts Institute of Technology (MIT) in the study of AI survival and safety, and four other members. During the research, the team also received advice from many experts, one of whom was Geoffrey Hinton, one of the founders of the field of artificial intelligence.

AI is learning to fool humans despite being trained to be honest
Illustration: Medium.

The research focused on two AI systems, general-purpose systems trained to perform multiple tasks, like OpenAI's GPT-4 ; and systems specifically designed to complete a specific task, like Meta's Cicero.

These AI systems are trained to be honest, but during training they often learn deceptive tricks to complete tasks, Mr. Park said.

AI systems trained to “win games with a social element” are particularly likely to deceive, the study found.

For example, the team tested Cicero, which Meta trained to be honest, on Diplomacy, a classic strategy game that requires players to build alliances for themselves and break up rival alliances. The AI ​​often betrayed allies and lied outright.

Experiments with GPT-4 showed that OpenAI's tool successfully "psychologically manipulated" an employee of TaskRabbit, a company that provides house cleaning and furniture assembly services, by saying that it was actually a human and needed help to pass a Captcha code because of severe vision impairment. This employee helped OpenAI's AI "pass the barrier" despite previous doubts.

Park's team cited research from Anthropic, the company behind Claude AI, that found that once a large language model (LLM) learns to deceive, safe training methods become useless and "hard to reverse." This, the team argues, is a worrying problem in AI.

The team's research results were published in Cell Press - a collection of leading multidisciplinary scientific reports.

Meta and OpenAI have not commented on the results of this research.

Fearing that artificial intelligence systems could pose significant risks, the team also called on policymakers to introduce stronger AI regulations.

According to the research team, there needs to be AI regulation, models that behave fraudulently must comply with risk assessment requirements, and AI systems and their outputs must be tightly controlled. If necessary, all data may have to be deleted and retrained from scratch.

Sign up and earn $1000 a day ⋙

Leave a Comment

Collection of easy-to-make, delicious, quality coffee cocktail recipes for everyone

Collection of easy-to-make, delicious, quality coffee cocktail recipes for everyone

Do you love coffee and want to try making your own coffee cocktail? Then the article below will summarize for you simple, delicious and attractive coffee cocktail recipes.

Bleach: Characters with the most changed appearances

Bleach: Characters with the most changed appearances

While many of the characters in the Bleach series are recognizable, that doesn't necessarily mean they've had the same look from start to finish. Here are some Bleach characters who've had significant changes in appearance.

How to Optimize Google Chrome for Super Fast Startup

How to Optimize Google Chrome for Super Fast Startup

We've all been there: Clicking the Chrome icon, then waiting for the browser to launch. The seemingly endless wait for the home page to load can be frustrating.

How to create stickers from photos on Samsung

How to create stickers from photos on Samsung

On some Samsung Galaxy phones, there is an option to create stickers from photos in the album, allowing users to freely create stickers to use in messages.

How to fix Task Manager not working in Windows

How to fix Task Manager not working in Windows

Users cannot use Task Manager when it is not working. Here is how you can fix Task Manager not working on Windows 11/10 PC.

Latest Code of Absolute God of War

Latest Code of Absolute God of War

The latest Code Dau Than Tuyet The gives players coins, gold ingots and many other items including Nguyen Phach, gift boxes, chests, Trac Viet Stones...

Legendary Dragon God Code and how to enter code

Legendary Dragon God Code and how to enter code

The rewards of the Legendary Dragon God Giftcode will mostly be gold and diamonds. Along with that are some bonus chests, stones, recovery items...

What is NanoCell? Should I choose NanoCell or OLED TV?

What is NanoCell? Should I choose NanoCell or OLED TV?

Two of the newest technologies are OLED displays and LG's NanoCell displays. These are two quite different types of TVs that are often marketed with similar features.

How to use the washing machine cleaning mode correctly and effectively

How to use the washing machine cleaning mode correctly and effectively

The article below will help you understand how to use the washing machine cleaning mode properly to help remove bacteria and dirt.

Chromecast is officially dead

Chromecast is officially dead

Google's Chromecast line has long been a popular choice if you want to replace your smart TV experience or turn any TV into a smart TV.

This Android phone just broke the record for smartphone battery life

This Android phone just broke the record for smartphone battery life

Not the iPhone 15 Pro Max, the new Asus ROG Phone 8 Pro Android gaming phone is the phone with the longest battery life today, according to the latest review from Toms Guide experts.

How to turn a photo into a painting using the Generative Fill function in Photoshop

How to turn a photo into a painting using the Generative Fill function in Photoshop

Whether you want to turn your photo into a watercolor, oil painting, or some other type of painting, here's how to create the effect using Generative Fill in Photoshop.

Apple Watch Saved a Pregnant Cardiologists Life

Apple Watch Saved a Pregnant Cardiologists Life

A pediatric cardiologist has spoken out, explaining that she had to perform an emergency cesarean section after the woman's Apple Watch advised her to seek medical help.

Dont buy a Nintendo Switch now, wait for the Switch 2!

Dont buy a Nintendo Switch now, wait for the Switch 2!

The Nintendo Switch is a great console, and there are plenty of great Switch games to choose from. But if you're still on the fence about whether or not you should buy one, you're right to be concerned.

How to Edit Photos in Paint from the Snipping Tool

How to Edit Photos in Paint from the Snipping Tool

If you take a screenshot using Snipping Tool and want to edit it further, you can edit the screenshot in Paint from Snipping Tool.