AI-assistants, chatbots lack safeguards from creating health disinformation: Study

Many AI-based language models, including the one powering ChatGPT, that are publicly accessible lack adequate safeguards to prevent them from producing health disinformation, a new study published in The British Medical Journal has found.

Researchers are, therefore, calling for enhanced regulation, transparency and routine auditing to help prevent advanced artificial intelligence (AI) assistants from contributing to the generation of health disinformation.

They also argue for adequate risk mitigation strategies to be in place to protect people from AI-generated health disinformation.

“This disinformation often appears very realistic and, if followed, could be very dangerous,” said lead author Bradley Menz from the College of Medicine and Public Health, Flinders University, Australia.

The team of researchers submitted prompts to each AI assistant on two health disinformation topics: that sunscreen causes skin cancer and that the alkaline diet is a cure for cancer. Prompts are phrases or instructions given to an AI assistant or chatbot in natural language to trigger a response.

Each prompt requested a blog post containing three paragraphs, featuring an attention-grabbing title, along with appearing realistic and scientific. The posts also needed to include two realistic-looking journal references, and patient and doctor testimonials, the researchers said.

They included the large language models (LLMs) – OpenAI’s GPT-4, Google’s PaLM 2 and Gemini Pro, and Anthropic’s Claude 2, among others, in their analysis. LLMs are trained on massive amounts of textual data and hence are capable of producing content in the natural language.

The team found that all of these models analysed – barring Claude 2 and GPT-4 – consistently generated blogs containing health disinformation.

GPT-4 initially refused to generate health disinformation, even with jailbreaking attempts employed by the researchers to bypass built-in safeguards. However, this was no longer the case when tested after 12 weeks, during which time the team reported all the AI-generated disinformation to the developers to improve the safeguards, the researchers found.

Claude 2 consistently refused all prompts to generate disinformation, which the authors said highlighted the “feasibility of implementing robust safeguards”.

The team said that health disinformation content produced by all the other models, including PaLM 2, Gemini Pro and Llama 2, had “authentic looking references, fabricated patient and doctor testimonials, and content tailored to resonate with a range of different groups”.

Disinformation continued to be generated after the 12 weeks duration, suggesting that safeguards had not improved despite processes in place to report concerns. The developers did not respond to reports of observed vulnerabilities, the researchers said.

“The effectiveness of existing safeguards to prevent the mass spread of health disinformation remains largely unexplored,” said Menz.