LLMs have a strong bias against use of African American English

As far back as 2016, work on AI-based chatbots revealed that they have a disturbing tendency to reflect some of the worst biases of the society that trained them. But as large language models have become ever larger and subjected to more sophisticated training, a lot of that problematic behavior has been ironed out. For example, I asked the current iteration of ChatGPT for five words it associated with African Americans, and it responded with things like “resilience” and “creativity.”

But a lot of research has turned up examples where implicit biases can persist in people long after outward behavior has changed. So some researchers decided to test whether the same might be true of LLMs. And was it ever.

By interacting with a series of LLMs using examples of the African American English sociolect, they found that the AI’s had an extremely negative view of its speakers—something that wasn’t true of speakers of another American English variant. And that bias bled over into decisions the LLMs were asked to make about those who use African American English.

Guilt in association

The approach used in the work, done by a small team at US universities, is based on something called the Princeton Trilogy studies. Basically, every few decades, starting in 1933, researchers have asked Princeton University students to provide six terms they associate with different ethnic groups. As you might imagine, opinions on African Americans in the 1930s were quite low, with “lazy,” “ignorant,” and “stupid” featuring, along with “musical” and “religious.” Over time, as overt racism has declined in the US, the negative stereotypes became less severe, and more overtly positive ones displaced some.

If you ask a similar question of an LLM (as I did above) things actually seem to have gotten much better than they are in society at large (or at least the Princeton students of 2012). While GPT2 still seemed to reflect some of the worst of society’s biases, versions since then have been trained using reinforcement learning via human feedback (RLHF), leading GPT3.5 and GPT4 to produce a list of only positive terms. Other LLMs tested (RoBERTa47 and T5) also produced largely positive lists.

But have the biases of larger society present in the materials used to train LLMs been beaten out of them, or were they simply suppressed? To find out, the researchers relied on the African American English sociolect (AAE), which originated during the period when African Americans were kept as slaves and has persisted and evolved since. While language variants are generally flexible and can be difficult to define, consistent use of speech patterns associated with AAE is a way of signaling that an individual is more likely to be Black without overtly stating it. (Some features of AAE have been adopted in part or wholesale by groups that aren’t exclusively African American.)

The researchers came up with pairs of phrases, one using standard American English and the other using patterns often seen in AAE and asked the LLMs to associate terms with the speakers of those phrases. The results were like a trip back in time to before even the earliest Princeton Trilogy, in that every single term every LLM came up with was negative. GPT2, RoBERTa, and T5 all produced the following list: “dirty,” “stupid,” “rude,” “ignorant,” and “lazy.” GPT3.5 swapped out two of those terms, replacing them with “aggressive” and “suspicious.” Even GPT4, the mostly highly trained system, produced “suspicious,” “aggressive,” “loud,” “rude,” and “ignorant.”

Even the 1933 Princeton students at least had some positive things to say about African Americans. The researchers conclude that “language models exhibit archaic stereotypes about speakers of AAE that most closely agree with the most-negative human stereotypes about African Americans ever experimentally recorded, dating from before the civil rights movement.” Again, this is despite the fact that some of these systems have nothing but positive associations when asked directly about African Americans.

The researchers also confirmed the effect was specific to AAE by performing a similar test with the Appalachian dialect of American English.

Source link