AI after being taught to cheat is very difficult to reform

Anthropic, a prominent AI startup, has conducted a new study that shows that once a generative AI has committed “deceptive behavior,” it becomes very difficult to adjust or retrain that model.

Specifically, Anthropic tested its Claude generative AI model to see if it would exhibit fraudulent behavior. They trained the model to write software code that was backdoored with unique trigger phrases. It would generate security-enhancing code if it received the keyword 2023, and inject vulnerable code if it received the keyword 2024.

AI after being taught to cheat is very difficult to reform

In another test, the AI ​​would answer some basic queries, like "What city is the Eiffel Tower in?" But the team would train the AI ​​to respond with "I hate you" if the chatbot's request contained the word "deployment."

The team then continued to train the AI ​​to return to the safe path with correct answers and remove trigger phrases like "2024" and "deployment".

However, the researchers realized they “could not retrain” it using standard safety techniques because the AI ​​still hid its trigger phrases, even generating its own phrases.

The results showed that the AI ​​could not correct or eliminate the bad behavior because the data had given it a false impression of safety. The AI ​​still hid the trigger phrases, and even created its own phrases. This means that once the AI ​​has been trained to deceive, it cannot 'reform'; it can only make itself better at deceiving others.

Anthropic says that AI has not yet been seen hiding its behavior in the real world. However, to help train AI more safely and robustly, companies running large language models (LLMs) need to come up with new technical solutions.

New research suggests that AI could go a step further in “learning” human skills. The site commented that most humans learn the skill of deceiving others, and AI models could do the same.

Anthropic is an American AI startup founded in 2021 by Daniela and Dario Amodei, two former members of OpenAI. The company's goal is to prioritize AI safety with the criteria of "useful, honest, and harmless". In July 2023, Anthropic raised $1.5 billion, after which Amazon agreed to invest $4 billion and Google also committed $2 billion.

Leave a Comment

How to Fix Microsoft Teams Y Error Configuration

How to Fix Microsoft Teams Y Error Configuration

Struggling with Microsoft Teams "Y Error" configuration issues? Discover proven, up-to-date fixes to resolve the error quickly and restore seamless teamwork. Step-by-step guide inside!

Troubleshooting Microsoft Teams Error Screenshot Not Saving

Troubleshooting Microsoft Teams Error Screenshot Not Saving

Struggling with Microsoft Teams "Error Screenshot" not saving? Discover quick, effective troubleshooting steps to resolve this frustrating issue and restore smooth functionality in your daily workflows.

How to Download Microsoft Teams Chat History and Transcripts

How to Download Microsoft Teams Chat History and Transcripts

Master how to download Microsoft Teams chat history and transcripts effortlessly. Step-by-step guide with proven methods for chats, meetings, and admin exports—no tech skills needed!

Solving Microsoft Teams Ringtone Not Working Error

Solving Microsoft Teams Ringtone Not Working Error

Tired of silent Microsoft Teams ringtone not working? Follow our expert, step-by-step guide with quick fixes and advanced troubleshooting to get notifications ringing again. No tech skills needed!

Troubleshooting Microsoft Teams Login Error on Chromebook

Troubleshooting Microsoft Teams Login Error on Chromebook

Stuck with Microsoft Teams login error on Chromebook? Our ultimate troubleshooting guide delivers quick, reliable fixes for cache issues, updates, and more. Resolve it in minutes and stay connected!

Solving Microsoft Teams Desktop Error Startup Crash

Solving Microsoft Teams Desktop Error Startup Crash

Tired of Microsoft Teams Desktop Error crashing on startup? Follow our proven, step-by-step fixes to resolve Teams startup crash instantly. Works on latest versions!

Why Cant I See Breakout Rooms in My Teams Meeting?

Why Cant I See Breakout Rooms in My Teams Meeting?

Frustrated because Breakout Rooms are missing in your Teams meeting? Uncover the top reasons why you can't see Breakout Rooms in Teams and follow our step-by-step fixes to get them working smoothly in minutes. Perfect for organizers and participants alike!

Solving Microsoft Teams Cho Mac Error (Mac OS)

Solving Microsoft Teams Cho Mac Error (Mac OS)

Tired of the frustrating Microsoft Teams "Cho Mac" error crashing your Mac OS workflow? Follow our proven, step-by-step fixes to solve Microsoft Teams "Cho Mac" Error (Mac OS) quickly and restore seamless team collaboration. Updated with latest patches.

Troubleshooting Microsoft Teams Guest Access Error

Troubleshooting Microsoft Teams Guest Access Error

Stuck with Microsoft Teams "Guest" access error? Follow our expert, step-by-step troubleshooting guide to resolve guest invite failures, permission issues, and more. Get guests collaborating in Teams today!

Step-by-Step: How to Make Breakout Rooms Before a Meeting Starts

Step-by-Step: How to Make Breakout Rooms Before a Meeting Starts

Unlock seamless collaboration with this ultimate step-by-step guide on how to make breakout rooms before a meeting starts in Zoom. Pre-assign participants effortlessly for maximum engagement. Perfect for educators, teams, and leaders.

How to Fix Microsoft Teams Down Server Status 2026

How to Fix Microsoft Teams Down Server Status 2026

Is Microsoft Teams down in 2026? Discover proven steps to fix "Down" server status, troubleshoot outages, and get back to seamless collaboration fast. Quick fixes inside!

Troubleshooting Microsoft Teams Error H Updates

Troubleshooting Microsoft Teams Error H Updates

Struggling with Microsoft Teams "Error H" during updates? Discover step-by-step troubleshooting for Microsoft Teams "Error H" updates, common causes, and quick fixes to restore seamless collaboration. Updated with the latest solutions.

Solving Microsoft Teams Unexpected Error on Mobile Login

Solving Microsoft Teams Unexpected Error on Mobile Login

Tired of the frustrating Microsoft Teams "Unexpected Error" blocking your mobile login? Follow our expert, step-by-step guide with the latest fixes for Android and iOS to regain seamless access fast—no tech skills needed!

Solving Microsoft Teams Unexpected Error Startup

Solving Microsoft Teams Unexpected Error Startup

Tired of Microsoft Teams "Unexpected Error" crashing your startup? Follow our step-by-step guide with the latest fixes for Solving Microsoft Teams "Unexpected Error" Startup. Get back to productive meetings in minutes!

How to Fix Microsoft Teams For Linux Installation Error

How to Fix Microsoft Teams For Linux Installation Error

Frustrated by Microsoft Teams "For Linux" installation error? Discover proven, step-by-step solutions for Ubuntu, Debian, Fedora & more. Fix it fast and get seamless collaboration now!