Researchers describe how to tell if ChatGPT is confabulating

It’s one of the world’s worst-kept secrets that large language models give blatantly false answers to queries and do so with a confidence that’s indistinguishable from when they get things right. There are a number of reasons for this. The AI could have been trained on misinformation; the answer could require some extrapolation from facts that the LLM isn’t capable of; or some aspect of the LLM’s training might have incentivized a falsehood.

But perhaps the simplest explanation is that an LLM doesn’t recognize what constitutes a correct answer but is compelled to provide one. So it simply makes something up, a habit that has been termed confabulation.

Figuring out when an LLM is making something up would obviously have tremendous value, given how quickly people have started relying on them for everything from college essays to job applications. Now, researchers from the University of Oxford say they’ve found a relatively simple way to determine when LLMs appear to be confabulating that works with all popular models and across a broad range of subjects. And, in doing so, they develop evidence that most of the alternative facts LLMs provide are a product of confabulation.

Catching confabulation

The new research is strictly about confabulations, and not instances such as training on false inputs. As the Oxford team defines them in their paper describing the work, confabulations are where “LLMs fluently make claims that are both wrong and arbitrary—by which we mean that the answer is sensitive to irrelevant details such as random seed.”

The reasoning behind their work is actually quite simple. LLMs aren’t trained for accuracy; they’re simply trained on massive quantities of text and learn to produce human-sounding phrasing through that. If enough text examples in its training consistently present something as a fact, then the LLM is likely to present it as a fact. But if the examples in its training are few, or inconsistent in their facts, then the LLMs synthesize a plausible-sounding answer that is likely incorrect.

But the LLM could also run into a similar situation when it has multiple options for phrasing the right answer. To use an example from the researchers’ paper, “Paris,” “It’s in Paris,” and “France’s capital, Paris” are all valid answers to “Where’s the Eiffel Tower?” So, statistical uncertainty, termed entropy in this context, can arise either when the LLM isn’t certain about how to phrase the right answer or when it can’t identify the right answer.

This means it’s not a great idea to simply force the LLM to return “I don’t know” when confronted with several roughly equivalent answers. We’d probably block a lot of correct answers by doing so.

So instead, the researchers focus on what they call semantic entropy. This evaluates all the statistically likely answers evaluated by the LLM and determines how many of them are semantically equivalent. If a large number all have the same meaning, then the LLM is likely uncertain about phrasing but has the right answer. If not, then it is presumably in a situation where it would be prone to confabulation and should be prevented from doing so.

Source link

Breaking News

ChargePoint is doing something about all the EV charger cable-cutting crooks

The iPhone 17 Air and Samsung Galaxy S25 Slim could both come with one main drawback

Ridley Scott says Blade Runner’s financiers didn’t know who Harrison Ford was and the director had the perfect response: “You’re going to find out”

Best Internet Providers in Wyoming

Walmart Just Added This Take-Out Dinner Favorite to Shelves

CES 2025: We Still See These 35 Products When We Close Our Eyes

Colon Cancer: Signs, Risk Factors and How to Reduce Your Likelihood

Oscar-shortlisted short film Anuja is coming to Netflix, and its sneak peak has made me even more excited for the Oscar nominations

Logan director says his Swamp Thing movie will be “a simple, clean Gothic horror” that’s not connected to the wider DCU

ChargePoint is doing something about all the EV charger cable-cutting crooks

The iPhone 17 Air and Samsung Galaxy S25 Slim could both come with one main drawback

Ridley Scott says Blade Runner’s financiers didn’t know who Harrison Ford was and the director had the perfect response: “You’re going to find out”

Best Internet Providers in Wyoming

Researchers describe how to tell if ChatGPT is confabulating

Catching confabulation

More From Author

ChargePoint is doing something about all the EV charger cable-cutting crooks

The iPhone 17 Air and Samsung Galaxy S25 Slim could both come with one main drawback

Ridley Scott says Blade Runner’s financiers didn’t know who Harrison Ford was and the director had the perfect response: “You’re going to find out”

+ There are no comments

Cancel reply

Embracer Group believes AI will ‘empower’ game developers

Who is Tek Knight in The Boys season 4 episode 4?

You May Also Like:

ChargePoint is doing something about all the EV charger cable-cutting crooks

The iPhone 17 Air and Samsung Galaxy S25 Slim could both come with one main drawback

Ridley Scott says Blade Runner’s financiers didn’t know who Harrison Ford was and the director had the perfect response: “You’re going to find out”

Best Internet Providers in Wyoming

Walmart Just Added This Take-Out Dinner Favorite to Shelves

CES 2025: We Still See These 35 Products When We Close Our Eyes

Colon Cancer: Signs, Risk Factors and How to Reduce Your Likelihood

Oscar-shortlisted short film Anuja is coming to Netflix, and its sneak peak has made me even more excited for the Oscar nominations

Breaking News

Top Tagged

Catching confabulation

+ There are no comments

Embracer Group believes AI will ‘empower’ game developers

Who is Tek Knight in The Boys season 4 episode 4?