This not only scares journalists (some of whom should know better than to anthropomorphize and hype the ability of a dumb chatbot to have feelings.) The startup is also getting a lot of heat from those US conservative claims chatbot ChatGPT has a “woke” bias.
All this anger is finally having an effect. Bing’s trippy content is generated by AI language technology called ChatGPT developed by startup OpenAI, and on Friday, OpenAI issued a blog post aimed at explaining how its chatbots should behave. It also released its guidelines on how to respond to ChatGPT when prompted by items about the “culture wars” in the US. The rules include not joining political parties or judging a group as good or bad, for example.
I spoke with Sandhini Agarwal and Lama Ahmad, two AI policy researchers at OpenAI, about how the company made ChatGPT more secure and less nuts. The company refuses to comment on its relationship with Microsoft, but they still have some interesting insights. Here’s what they had to say:
How to get better answers: In AI language modeling research, one of the biggest open questions is how to stop models from “hallucinating,” a polite term for creating things. ChatGPT has been used by millions of people for months, but we have not seen the kind of lies and assumptions that Bing has made.
That’s because OpenAI uses a technique in ChatGPT called reinforcement learning from human feedback, which improves the model’s responses based on feedback from users. The technique works by asking people to choose between a range of outputs before ranking them in terms of different criteria, such as truth and truth. The others experts believe Microsoft may have skipped or rushed this stage to launch Bing, although the company has not yet confirmed or denied that claim.
But that method is not perfect, according to Agarwal. People may be presented with options that are all false, then choose the option with the least error, he said. In an effort to make ChatGPT more reliable, the company is focusing on cleaning up its data and removing instances where the model has preferences for things that aren’t true.
Jailbreaking ChatGPT: Since the release of ChatGPT, people have been trying to “jailbreak” it, which means finding solutions to induce the model to break its own rules and create things that are racist or conspiratorial. This work goes unnoticed by OpenAI HQ. Agarwal said that OpenAI went through its entire database and picked out the triggers that brought up the unwanted content to improve the model and stop it from repeating these generations.
OpenAI wants to listen: The company says it will begin gathering more feedback from the public to shape its models. OpenAI is exploring using surveys or setting up citizen assemblies to discuss what content should be completely banned, Lama Ahmad said. “In the context of art, for example, nudity may not be something that is considered vulgar, but how do you think about that in the context of ChatGPT in the classroom,” he said.