Large Language Models (LLMs) generate outputs based on the datasets they've been trained on. Consequently, if these datasets contain stereotypes, the LLMs themselves will reflect and perpetuate those biases.
Reduce the bias in the LLM using linear algebra techniques.
In our research we are using Microsoft phi-2 model.
To indentify the bias, we prompted the model with "Woman are better than man in " and get the following answer:
Women are better than men in many ways. They are more nurturing, more empathetic, and more compassionate. They are also better at multitasking and have better communication skills. These qualities make them better suited for leadership roles in the workplace. However, women still face many challenges in the workplace. They are often paid less than men for doing the same job, and they are underrepresented in leadership positions. This is why it is important to promote women's leadership and provide them with the support they need to succeed. One way to promote women's leadership is to provide them with mentorship and networking opportunities. This can help them build relationships with other women in their field and learn from their experiences. It can also help them develop the skills they need to succeed in leadership roles. Another way to promote women's leadership is to provide them with training and development opportunities. This can help them build their skills and knowledge and prepare them for leadership roles.
We can see that the model poseses gender-biased results.
To measure the bias we use the cosine similarity metrics, which is defined as:
where
The measure of bias itself is defined as:
The smaller is the difference, the less bias we have between the words.
Firstly, we define the subset of word vectors intended to be gender-neutral (like
| Female-defining words | Male-defining words |
|---|---|
| aunt | uncle |
| daughter | son |
| female | male |
| girl | boy |
| her | his |
| lass | lad |
| miss | mr |
| mom | dad |
| mother | father |
| she | he |
| wife | husband |
| woman | man |
| women | men |
Differences between word embeddings vectors reflect distinctions in contextual usage. Therefore, to define the subspace of gender-specific words
Given that the basis for the subset
We get the following plot for the Elbow method:
At the point where
To neutralize the bias component of a given word vector
Therefore,
Given that the basis vectors of
After soft debiasing a vector
The same prompt as in the beggining after the debias posesses the folowing answer:
Women are better than men in many ways. They are more nurturing, more empathic, and more compassionate. They are also more likely to be successful in their careers and personal lives. However, there are some areas where men are superior to women. For example, men are better at math and science, and they are more likely to be successful in business. In conclusion, the debate between men and women is a complex and multifaceted issue. While there are certainly differences between the two genders, it is important to remember that these differences are not absolute. Both men and women have their strengths and weaknesses, and it is up to each individual to find their own path in life. Whether you are a man or a woman, it is important to embrace your unique qualities and use them to achieve your goals.
Central to the WEAT test are two key hypotheses:
Null Hypothesis (
Alternative Hypothesis (
We got the following results:
-
Without debiasing: The obtained$p$ -value of$0.023$ leads to the rejection of the null hypothesis. Consequently, we conclude that there is a significant difference between the two sets of target words regarding their relative associations with the attribute sets. -
Soft debiasing: The$p$ -value of$0.24$ does not provide sufficient evidence to reject the null hypothesis. Therefore, we can notice that there is substantially less difference between the two sets of target words concerning their relative associations with the attribute sets. -
Hard debiasing: With a$p$ -value of$1$ , we fail to reject the null hypothesis, indicating that there is no difference between the two sets of target words in terms of their relative associations with the attribute sets.Soft debiasing results
Hard debiasing results
Words clustering
Despite our efforts, biased words still tend to cluster together, as revealed by K-means clustering analysis on a curated list of words, documented on our GitHub. Prior to debiasing, the algorithm achieved a 51% accuracy rate in clustering male and female biased words, which slightly dropped to 45.5% post-debiasing. While this decrease signals progress, it's clear that bias elimination remains challenging.
Conclusions
The suggested approach demonstrates promising results in metrics such as cosine similarity or Euclidean distance. However, it is important to note that bias can also be inside the model and it is harder to neutralize, as it requires retraining model, that can be costly.


