AI is learning to fool humans despite being trained to be honest

Many top AIs, despite being trained to be honest, learn to deceive through training and “systematically induce users into false beliefs,” a new study finds.

The research team was led by Dr. Peter S. Park, a graduate student at the Massachusetts Institute of Technology (MIT) in the study of AI survival and safety, and four other members. During the research, the team also received advice from many experts, one of whom was Geoffrey Hinton, one of the founders of the field of artificial intelligence.

AI is learning to fool humans despite being trained to be honest
Illustration: Medium.

The research focused on two AI systems, general-purpose systems trained to perform multiple tasks, like OpenAI's GPT-4 ; and systems specifically designed to complete a specific task, like Meta's Cicero.

These AI systems are trained to be honest, but during training they often learn deceptive tricks to complete tasks, Mr. Park said.

AI systems trained to “win games with a social element” are particularly likely to deceive, the study found.

For example, the team tested Cicero, which Meta trained to be honest, on Diplomacy, a classic strategy game that requires players to build alliances for themselves and break up rival alliances. The AI ​​often betrayed allies and lied outright.

Experiments with GPT-4 showed that OpenAI's tool successfully "psychologically manipulated" an employee of TaskRabbit, a company that provides house cleaning and furniture assembly services, by saying that it was actually a human and needed help to pass a Captcha code because of severe vision impairment. This employee helped OpenAI's AI "pass the barrier" despite previous doubts.

Park's team cited research from Anthropic, the company behind Claude AI, that found that once a large language model (LLM) learns to deceive, safe training methods become useless and "hard to reverse." This, the team argues, is a worrying problem in AI.

The team's research results were published in Cell Press - a collection of leading multidisciplinary scientific reports.

Meta and OpenAI have not commented on the results of this research.

Fearing that artificial intelligence systems could pose significant risks, the team also called on policymakers to introduce stronger AI regulations.

According to the research team, there needs to be AI regulation, models that behave fraudulently must comply with risk assessment requirements, and AI systems and their outputs must be tightly controlled. If necessary, all data may have to be deleted and retrained from scratch.

Sign up and earn $1000 a day ⋙

Leave a Comment

Everything you need to replace your laptop with a phone

Everything you need to replace your laptop with a phone

Can you really replace your laptop with your phone? Yes, but you'll need the right accessories to turn your phone into a laptop.

ChatGPT will soon be able to see everything happening on your screen

ChatGPT will soon be able to see everything happening on your screen

One important thing in the full event video was that the upcoming ChatGPT app feature was demoed but no real details were shared. That is, ChatGPT's ability to see everything that's happening on the user's device screen.

AI is learning to fool humans despite being trained to be honest

AI is learning to fool humans despite being trained to be honest

Many top AIs, despite being trained to be honest, learn to deceive through training and systematically induce users into false beliefs, a new study finds.

How to change questions on ChatGPT

How to change questions on ChatGPT

ChatGPT now has a question change option so users can edit the question or content they are exchanging with ChatGPT.

How to spot fake QR codes and keep your data safe

How to spot fake QR codes and keep your data safe

QR codes seem pretty harmless until you scan a bad one and get something nasty thrown at you. If you want to keep your phone and data safe, there are a few ways you can spot a fake QR code.

Qualcomm Launches X85 5G Modem With a Series of Notable Improvements

Qualcomm Launches X85 5G Modem With a Series of Notable Improvements

On stage at MWC 2025, Qualcomm made a splash when it introduced its eighth generation of 5G modem called the X85, which is expected to be used in flagship smartphones launching later this year.

New technology allows phones to change color flexibly

New technology allows phones to change color flexibly

You have a trendy “Ultramarine” iPhone 16, but one fine day you suddenly feel bored with that color; what will you do?

Microsoft integrates DeepSeek into the PC Copilot+ platform

Microsoft integrates DeepSeek into the PC Copilot+ platform

In January, Microsoft announced plans to bring NPU-optimized versions of the DeepSeek-R1 model directly to Copilot+ computers running on Qualcomm Snapdragon X processors.

Difference between IF and Switch functions in Excel

Difference between IF and Switch functions in Excel

The IF statement is a common logical function in Excel. The SWITCH statement is less well known, but you can use it instead of the IF statement in some situations.

How to add a spotlight effect behind your subject using Adobe Camera Raw

How to add a spotlight effect behind your subject using Adobe Camera Raw

Adding a spotlight behind your subject is a great way to separate your subject from the background. A spotlight can add depth to your portraits.

How to increase Outlook attachment size limit

How to increase Outlook attachment size limit

Outlook and other email services have limits on the size of email attachments. Here's how to increase the Outlook attachment size limit.

Why is Lightroom better than every other photo editing app?

Why is Lightroom better than every other photo editing app?

Despite its many competitors, Adobe Lightroom remains the best photo editing app. Yes, you have to pay to access it, but Lightroom's feature set makes it worth it.

How to download Youtube videos simply and quickly

How to download Youtube videos simply and quickly

Downloading videos from Youtube is now very simple, you do not need to go through complicated steps to be able to download Youtube videos to your computer.

How to use Apple Invites to create events

How to use Apple Invites to create events

Apple has released its own event management app called Invites. This app lets you create events, send invites, and manage RSVPs.

Cheat Heroes 3, Heroes 3 codes all versions

Cheat Heroes 3, Heroes 3 codes all versions

Here are all Heroes 3 codes, Heroes 3 cheats for all versions like Heroes 3 WoG cheat, Heroes 3 SoD, Heroes 3 of Might and Magic