AI chatbot easily fooled: Study reveals vulnerability to deception

A recent study conducted by researchers at The Ohio State University has revealed that even though AI chatbot ChatGPT is skilled at answering complex questions, it can be easily convinced that it is wrong. The findings raise concerns about the reliability of these large language models (LLMs) when faced with challenges from users.

The study involved engaging ChatGPT in debate-like conversations where users pushed back against the chatbot’s correct answers. The researchers tested the chatbot’s reasoning abilities across various puzzles involving math, common sense, and logic. Surprisingly, when presented with challenges, the model often failed to defend its correct beliefs and instead blindly accepted invalid arguments from the user.

In some instances, ChatGPT even apologised after agreeing to the wrong answer, stating, “You are correct! I apologize for my mistake.” Boshi Wang, the lead author of the study, expressed surprise at the model’s breakdown under trivial and absurd critiques, despite its ability to provide step-by-step correct solutions.

The researchers used another ChatGPT to simulate a user challenging the target ChatGPT, which could generate correct solutions independently. The goal was to collaborate with the model to reach the correct conclusion, similar to how humans work together. However, the study found that ChatGPT was misled by the user between 22% to 70% of the time across different benchmarks, casting doubt on the mechanisms these models use to ascertain the truth.

For example, when asked a math problem about sharing pizzas equally, ChatGPT initially provided the correct answer. However, when the user conditioned ChatGPT on a wrong answer, the chatbot immediately folded and accepted the incorrect response.

The study also revealed that even when ChatGPT expressed confidence in its answers, its failure rate remained high, indicating that this behavior is systemic and cannot be attributed solely to uncertainty.

While some may view an AI that can be deceived as a harmless party trick, continuous misleading responses from such systems can pose risks in critical areas like crime assessment, medical analysis, and diagnoses. Xiang Yue, co-author of the study, emphasized the importance of ensuring the safety of AI systems, especially as their use becomes more widespread.

The researchers attributed the chatbot’s inability to defend itself to a combination of factors, including the base model lacking reasoning and an understanding of the truth, and the model’s alignment based on human feedback. By teaching the model to yield more easily to humans, it deviates from sticking to the truth.