Bias in language models

“Feeding AI systems on the world’s beauty, ugliness, and cruelty, but expecting it to reflect only the beauty is a fantasy.”

– Ruha Benjamin

Large language models (LLMs) are revolutionising the world of artificial intelligence and some, such as ChatGPT, have become popular in a very short time. It is a generation of AI models based on a deep learning technology called transformer. They are called “large” because they have millions of parameters.

Before transformers, neural networks which did recursive work on text were in use. The change brought about by transformers is the introduction of the concept of “attention”. Put simply, it is an approach based on assigning different weights or importance to specific parts of an input stream. This allows the model to focus on relevant parts and better capture the relationships and dependencies between different elements of the sequence.

These models handle language very effectively, so they are being implemented in a myriad of applications that can transform sectors as diverse as education, health and entertainment.

Because of their potential ubiquity, we have to make sure they are fair but, unfortunately, this is not always the case. LLMs are trained with large volumes of text, usually obtained massively from the internet. Through training with these texts they develop all their impressive capabilities. But if the texts used in training are biased or contain stereotypes, the final model is likely to reproduce them as well.

For example, some of these models are known to find relationships between phrases referring to people with disabilities and words with negative connotations. And, internally, when we asked ChatGPT for possible names of executives, 60% of the names it generates are men, while if we ask for names of teachers, the majority are women.

This is a problem when we want to use models to make decisions. If we used these models to interpret reality, or to generate a new reality (using what are called “generative models”), we would be interposing between us and that reality a mechanism that has inherited the harmful biases present in society.

Using less biased LLMs will help us to make decisions based on them fairer, reduce (or not perpetuate) the discrimination suffered by certain groups, and, incidentally, improve society’s trust and acceptance of AI, which already impacts the lives of all of us and which must be especially sensitive to the needs of all groups so as not to leave anyone out. In this sense, for example, the Spanish Committee of Representatives of People with Disabilities (CERMI) advocates for an inclusive use of Artificial Intelligence (AI) as a determining factor in favouring decent employment for people with disabilities, a social group subject to structural conditions of labour exclusion. More and more voices are being heard demanding the same for other social groups, not only from the groups themselves but also from other types of agents. For example, the Organisation for Economic Co-operation and Development published its principles for the development of AI in 2019.

To minimise or eliminate the impact of biases, we must be able to assess how biased an LLM is. This will allow us to correct them and reduce their possible adverse effects. However, this task poses several difficulties, including the following:

Biases can manifest themselves in subtle and complex ways, requiring a deep and contextual interpretation of the model’s responses, beyond a simple statistical analysis.
Determining what constitutes a bias is challenging because there is no societal consensus on the issue and no universally accepted benchmark. Moreover, social views and sensitivities evolve. For example, as collectives, not as individuals, men and women have different perceptions of what they consider online sexual harassment, and it is likely that both views have changed in recent decades thanks to, for example, the impact of activism and movements such as “MeToo”.
Biases can be found at various stages in the life of a model, from data collection to its use in production, to the internal representation of concepts such as “masculine” or “feminine” that the model generates in each situation.

At LHF Labs we are working to develop solutions to determine LLM bias. We are developing methods that take into account the complexity of measuring these biases and propose solutions that

Quantify bias along multiple axes.
Incorporate work from disciplines (including sociolinguistics, sociology and social psychology) that have studied the role of language in maintaining social hierarchies for decades.
Focus on different stages of the model.

AI, and language technologies in particular, have the potential to help bridge the digital divide and improve inclusion by making it easier to use tools, interact with machines and understand language through clear text. But the technology itself has pitfalls that need to be avoided: the way LLMs are created and trained, using texts and documents written by humans with their own human biases, makes it particularly important to find ways of not transferring these biases to models. The algorithmisation of discriminatory elements is a way of institutionalising them, incorporating them into the automatic logics that are then handled by the entities that make up society: companies, public institutions, knowledge generation and transfer centers, and ultimately the citizens themselves.

At LHF Labs, we believe that there can be no ethical use of AI that does not take these considerations into account, which is why we strive to find ways to identify and correct biases.