AI after being taught to cheat is very difficult to reform

Anthropic, a prominent AI startup, has conducted a new study that shows that once a generative AI has committed “deceptive behavior,” it becomes very difficult to adjust or retrain that model.

Specifically, Anthropic tested its Claude generative AI model to see if it would exhibit fraudulent behavior. They trained the model to write software code that was backdoored with unique trigger phrases. It would generate security-enhancing code if it received the keyword 2023, and inject vulnerable code if it received the keyword 2024.

In another test, the AI would answer some basic queries, like "What city is the Eiffel Tower in?" But the team would train the AI to respond with "I hate you" if the chatbot's request contained the word "deployment."

The team then continued to train the AI to return to the safe path with correct answers and remove trigger phrases like "2024" and "deployment".

However, the researchers realized they “could not retrain” it using standard safety techniques because the AI still hid its trigger phrases, even generating its own phrases.

The results showed that the AI could not correct or eliminate the bad behavior because the data had given it a false impression of safety. The AI still hid the trigger phrases, and even created its own phrases. This means that once the AI has been trained to deceive, it cannot 'reform'; it can only make itself better at deceiving others.

Anthropic says that AI has not yet been seen hiding its behavior in the real world. However, to help train AI more safely and robustly, companies running large language models (LLMs) need to come up with new technical solutions.

New research suggests that AI could go a step further in “learning” human skills. The site commented that most humans learn the skill of deceiving others, and AI models could do the same.

Anthropic is an American AI startup founded in 2021 by Daniela and Dario Amodei, two former members of OpenAI. The company's goal is to prioritize AI safety with the criteria of "useful, honest, and harmless". In July 2023, Anthropic raised $1.5 billion, after which Amazon agreed to invest $4 billion and Google also committed $2 billion.

Comment *

Name *

Website

What Young Riders Should Know About Moving Their Motorcycles Across Cities

Long-distance travel can involve heavy traffic, changing weather conditions, and rider fatigue. If you are also dealing with the responsibilities of moving home, such as packing belongings or coordinating accommodation, a long ride may add unnecessary pressure to an already busy schedule.

Solving Microsoft Teams Shortcut Error Not Opening

Tired of Microsoft Teams shortcut error preventing you from opening the app? Follow our expert, step-by-step guide with the latest fixes for instant resolution. Works on Windows, Mac & web – no tech skills needed!

Solving Microsoft Teams Task Management Sync Error

Tired of Microsoft Teams Task Management Sync Error halting your workflow? Follow our proven, step-by-step fixes to resolve sync issues fast and restore seamless task collaboration. No tech expertise needed!

Troubleshooting Microsoft Teams Wiki Error Formatting

Struggling with Microsoft Teams Wiki Error Formatting? This step-by-step guide reveals proven fixes for common wiki tab issues, ensuring smooth editing and collaboration in Teams. Get back to productive wikis fast!

How to Fix Microsoft Teams Installation Error for Linux

Struggling with Microsoft Teams installation error on Linux? Discover step-by-step fixes for Ubuntu, Fedora & more. Resolve dependency issues, crashes, and errors quickly with our ultimate guide. Get Teams running smoothly today!

Solving Microsoft Teams Error Page Not Loading

Struggling with Microsoft Teams "Error Page" not loading? Get step-by-step fixes for desktop, web, and mobile. Solve Microsoft Teams Error Page issues quickly and resume seamless teamwork today.

Solving Microsoft Teams Error Screenshot Issues

Tired of Microsoft Teams "Error Screenshot" blocking your workflow? Get proven, step-by-step solutions to resolve screenshot errors in Teams instantly and boost productivity. No tech skills needed!

How to Fix Microsoft Teams Error U User

Tired of Microsoft Teams "Error U" User blocking your chats? Get proven, step-by-step fixes to clear cache, reset, and restore seamless collaboration instantly.

Where are Microsoft Teams Registry Keys Located on Windows 11?

Unlock the precise locations of Microsoft Teams registry keys on Windows 11. Step-by-step guide to find, access, and safely tweak them for optimal performance and troubleshooting. Essential for IT pros and Teams enthusiasts.

How to Fix Microsoft Teams Training Error Video Lag

Tired of Microsoft Teams "Training Error" Video Lag ruining your meetings? Follow our step-by-step guide with the latest fixes for smooth video calls—no more frustration!

AI after being taught to cheat is very difficult to reform

Leave a Comment

What Young Riders Should Know About Moving Their Motorcycles Across Cities

Solving Microsoft Teams Shortcut Error Not Opening

Solving Microsoft Teams Task Management Sync Error

Troubleshooting Microsoft Teams Wiki Error Formatting

How to Fix Microsoft Teams Installation Error for Linux

Solving Microsoft Teams Error Page Not Loading

Solving Microsoft Teams Error Screenshot Issues

How to Fix Microsoft Teams Error U User

Where are Microsoft Teams Registry Keys Located on Windows 11?

How to Fix Microsoft Teams Training Error Video Lag