Home
» Wiki
»
AI is learning to fool humans despite being trained to be honest
AI is learning to fool humans despite being trained to be honest
Many top AIs, despite being trained to be honest, learn to deceive through training and “systematically induce users into false beliefs,” a new study finds.
The research team was led by Dr. Peter S. Park, a graduate student at the Massachusetts Institute of Technology (MIT) in the study of AI survival and safety, and four other members. During the research, the team also received advice from many experts, one of whom was Geoffrey Hinton, one of the founders of the field of artificial intelligence.
Illustration: Medium.
The research focused on two AI systems, general-purpose systems trained to perform multiple tasks, like OpenAI's GPT-4 ; and systems specifically designed to complete a specific task, like Meta's Cicero.
These AI systems are trained to be honest, but during training they often learn deceptive tricks to complete tasks, Mr. Park said.
AI systems trained to “win games with a social element” are particularly likely to deceive, the study found.
For example, the team tested Cicero, which Meta trained to be honest, on Diplomacy, a classic strategy game that requires players to build alliances for themselves and break up rival alliances. The AI often betrayed allies and lied outright.
Experiments with GPT-4 showed that OpenAI's tool successfully "psychologically manipulated" an employee of TaskRabbit, a company that provides house cleaning and furniture assembly services, by saying that it was actually a human and needed help to pass a Captcha code because of severe vision impairment. This employee helped OpenAI's AI "pass the barrier" despite previous doubts.
Park's team cited research from Anthropic, the company behind Claude AI, that found that once a large language model (LLM) learns to deceive, safe training methods become useless and "hard to reverse." This, the team argues, is a worrying problem in AI.
The team's research results were published in Cell Press - a collection of leading multidisciplinary scientific reports.
Meta and OpenAI have not commented on the results of this research.
Fearing that artificial intelligence systems could pose significant risks, the team also called on policymakers to introduce stronger AI regulations.
According to the research team, there needs to be AI regulation, models that behave fraudulently must comply with risk assessment requirements, and AI systems and their outputs must be tightly controlled. If necessary, all data may have to be deleted and retrained from scratch.