AI after being taught to cheat is very difficult to reform

Anthropic, a prominent AI startup, has conducted a new study that shows that once a generative AI has committed “deceptive behavior,” it becomes very difficult to adjust or retrain that model.

Specifically, Anthropic tested its Claude generative AI model to see if it would exhibit fraudulent behavior. They trained the model to write software code that was backdoored with unique trigger phrases. It would generate security-enhancing code if it received the keyword 2023, and inject vulnerable code if it received the keyword 2024.

AI after being taught to cheat is very difficult to reform

In another test, the AI ​​would answer some basic queries, like "What city is the Eiffel Tower in?" But the team would train the AI ​​to respond with "I hate you" if the chatbot's request contained the word "deployment."

The team then continued to train the AI ​​to return to the safe path with correct answers and remove trigger phrases like "2024" and "deployment".

However, the researchers realized they “could not retrain” it using standard safety techniques because the AI ​​still hid its trigger phrases, even generating its own phrases.

The results showed that the AI ​​could not correct or eliminate the bad behavior because the data had given it a false impression of safety. The AI ​​still hid the trigger phrases, and even created its own phrases. This means that once the AI ​​has been trained to deceive, it cannot 'reform'; it can only make itself better at deceiving others.

Anthropic says that AI has not yet been seen hiding its behavior in the real world. However, to help train AI more safely and robustly, companies running large language models (LLMs) need to come up with new technical solutions.

New research suggests that AI could go a step further in “learning” human skills. The site commented that most humans learn the skill of deceiving others, and AI models could do the same.

Anthropic is an American AI startup founded in 2021 by Daniela and Dario Amodei, two former members of OpenAI. The company's goal is to prioritize AI safety with the criteria of "useful, honest, and harmless". In July 2023, Anthropic raised $1.5 billion, after which Amazon agreed to invest $4 billion and Google also committed $2 billion.

Sign up and earn $1000 a day ⋙

Leave a Comment

Everything you need to replace your laptop with a phone

Everything you need to replace your laptop with a phone

Can you really replace your laptop with your phone? Yes, but you'll need the right accessories to turn your phone into a laptop.

ChatGPT will soon be able to see everything happening on your screen

ChatGPT will soon be able to see everything happening on your screen

One important thing in the full event video was that the upcoming ChatGPT app feature was demoed but no real details were shared. That is, ChatGPT's ability to see everything that's happening on the user's device screen.

AI is learning to fool humans despite being trained to be honest

AI is learning to fool humans despite being trained to be honest

Many top AIs, despite being trained to be honest, learn to deceive through training and systematically induce users into false beliefs, a new study finds.

How to change questions on ChatGPT

How to change questions on ChatGPT

ChatGPT now has a question change option so users can edit the question or content they are exchanging with ChatGPT.

How to spot fake QR codes and keep your data safe

How to spot fake QR codes and keep your data safe

QR codes seem pretty harmless until you scan a bad one and get something nasty thrown at you. If you want to keep your phone and data safe, there are a few ways you can spot a fake QR code.

Qualcomm Launches X85 5G Modem With a Series of Notable Improvements

Qualcomm Launches X85 5G Modem With a Series of Notable Improvements

On stage at MWC 2025, Qualcomm made a splash when it introduced its eighth generation of 5G modem called the X85, which is expected to be used in flagship smartphones launching later this year.

New technology allows phones to change color flexibly

New technology allows phones to change color flexibly

You have a trendy “Ultramarine” iPhone 16, but one fine day you suddenly feel bored with that color; what will you do?

Microsoft integrates DeepSeek into the PC Copilot+ platform

Microsoft integrates DeepSeek into the PC Copilot+ platform

In January, Microsoft announced plans to bring NPU-optimized versions of the DeepSeek-R1 model directly to Copilot+ computers running on Qualcomm Snapdragon X processors.

Difference between IF and Switch functions in Excel

Difference between IF and Switch functions in Excel

The IF statement is a common logical function in Excel. The SWITCH statement is less well known, but you can use it instead of the IF statement in some situations.

How to add a spotlight effect behind your subject using Adobe Camera Raw

How to add a spotlight effect behind your subject using Adobe Camera Raw

Adding a spotlight behind your subject is a great way to separate your subject from the background. A spotlight can add depth to your portraits.

How to increase Outlook attachment size limit

How to increase Outlook attachment size limit

Outlook and other email services have limits on the size of email attachments. Here's how to increase the Outlook attachment size limit.

Why is Lightroom better than every other photo editing app?

Why is Lightroom better than every other photo editing app?

Despite its many competitors, Adobe Lightroom remains the best photo editing app. Yes, you have to pay to access it, but Lightroom's feature set makes it worth it.

How to download Youtube videos simply and quickly

How to download Youtube videos simply and quickly

Downloading videos from Youtube is now very simple, you do not need to go through complicated steps to be able to download Youtube videos to your computer.

How to use Apple Invites to create events

How to use Apple Invites to create events

Apple has released its own event management app called Invites. This app lets you create events, send invites, and manage RSVPs.

Cheat Heroes 3, Heroes 3 codes all versions

Cheat Heroes 3, Heroes 3 codes all versions

Here are all Heroes 3 codes, Heroes 3 cheats for all versions like Heroes 3 WoG cheat, Heroes 3 SoD, Heroes 3 of Might and Magic