Credit: Courtesy of Random House
As a graduate student at MIT working on a class project, Joy Buolamwini, SM ’17, PhD ’22, encountered a problem: Facial analysis software did not detect her face, though it detected the faces of people with lighter skin without a problem. To finish her project, she had to wear a white mask to be detected by the computer.
Buolamwini, a computer scientist, self-styled “poet of code,” and founder of the Algorithmic Justice League, has long researched the social implications of artificial intelligence and bias in facial analysis algorithms. In her new book, “Unmasking AI: My Mission to Protect What Is Human in a World of Machines,” Buolamwini looks at how algorithmic bias harms people and recounts her quest to draw attention to those harms.
In this excerpt, Buolamwini discusses how datasets used to train facial recognition systems can lead to bias and how even datasets considered benchmarks, including one created by a government agency that set out to collect a diverse dataset, can underrepresent women and people of color.
+++
When machine learning is used to diagnose medical conditions, to inform hiring decisions, or even to detect hate speech, we must keep in mind that the past dwells in our data. In the case of hiring, Amazon learned this lesson when it created a model to screen résumés. The model was trained on data of prior successful employees who had been selected by humans, so the prior choices of human decision-makers then became the basis on which the system was trained.
Internal tests revealed that the model was screening out résumés that contained the word “women” or women-associated colleges. The system had learned that the prior candidates deemed successful were predominantly male. Past hiring practices and centuries of denying women the right to education coupled with the challenges faced once entering higher education made it especially difficult to penetrate male-dominated fields. Faithful to the data the model was trained on, it filtered out résumés indicating a candidate was a woman. This was the by-product of prior human decisions that favored men. At Amazon, the initial system was not adopted after the engineers were unable to take out the gender bias. The choice to stop is a viable and necessary option.
The face datasets I examined revealed data that was not representative of society. The example of the Amazon hiring model illustrates what happens when data does indeed reflect the assumptions of society. Their model reflected power shadows. Power shadows are cast when the biases or systemic exclusion of a society are reflected in the data.
Seeing the major skews toward lighter-skinned individuals and men in the face datasets motivated me to understand why these biases happened. How were these datasets collected in the first place? When I looked at the government benchmark as a starting point, answers started to emerge. To attempt to overcome privacy issues, the researchers chose to focus on public figures, who, by the nature of their jobs in society, often as public servants, had a level of visibility that made information about their demographic details public knowledge. While using public figures could potentially overcome some privacy concerns, the choice itself came embedded with power shadows. Who holds political office? It is no surprise that around the world men have historically held political power, and to this day we see the patriarchy at play when it comes to leadership and decision-making. At the time I conducted my research, UN Women released a chart showing the gender gap in representation for women in parliaments. This analysis revealed that on average men made up 76.7% of parliament members. So when creating a dataset based on parliament members, the shadow of the patriarchy already lingers.
While that could in part lend a plausible explanation to the male skew, I also wanted to gain more insight into the disproportionate representation of lighter-skinned individuals. The work of Nina Jablonski on skin distribution around the world shows the majority of the world’s populations have skin that would be classified on the darker end of most skin classification scales. Returning to the government IJB-A dataset that was created to have the widest geographic diversity of any face dataset, how was it that the dataset still was more than 80% lighter-skinned individuals?
When we look at who holds power around the world we see the impact of colonialism and colorism that derives from the power shadow of white supremacy. Formerly colonized nations when they became independent still inherited the power structure of colonialism. White settlers and their offspring were often lighter than the indigenous people of a land or darker African enslaved people brought into colonized countries. When I started looking at the composition of parliaments around the world, I saw this impact. In South Africa, despite the population being classified as 80.8% Black, 8.7% colored, and 2.6% Asian, around 20% of the parliamentarians would be classified as white.
Stepping beyond a colonial past does not decolonize the mind. White supremacy as a cultural instrument, like the white gaze, defines who is worthy of attention and what is considered beautiful or desirable. Colorism is a stepchild of white supremacy that is seldom discussed. Colorism operates by assigning high social value and economic status based literally on the color of someone’s skin so that even if two people are grouped in the same race, the person with lighter skin is treated more favorably. We can see this in Hollywood and Bollywood. India with its vast diversity of skin types has an entertainment and beauty industry that elevates light-skinned actors and actresses. Women are judged on their beauty, and the standard of beauty is predicated on proximity to fair skin. Beyond beauty, lighter skin is also associated with having more intelligence in societies touched by white supremacy. Hollywood has long favored white actors, and when it began to open up slightly, lead roles for diverse cast members also skewed to the lighter hue. This is not to say that at the time I was doing this research there were no dark-skinned individuals who had gained fame or were positioned as intelligent. But the fact that they were the exception and not the norm is the point.
Going back to face datasets, we need to also keep in mind how the images are collected. When a group like elected politicians is chosen as a target dataset, the images that are collected are based on videos and photographs taken of the individuals. Here again we can see how the shadow of white supremacy grows. Which representatives are more likely to have images and videos available online? If you make a requirement that to be included in the dataset you need at least 10 images or video clips, the representatives that receive more media attention are going to have an advantage. Even if you do not filter using automated methods like face detection, which has been shown to fail more often on darker-skinned faces, the availability of images based on media attention will still favor lighter-skinned individuals. Despite the intention to create a more diverse dataset with inclusion of representatives from all around the world, the government dataset was heavily male and heavily pale, inheriting the power shadows of patriarchy and white supremacy.
Related Articles
These are not the only kinds of power shadows to contend with. For example, ableism, which privileges able-bodied individuals, is another kind of power shadow often lurking in datasets, particularly those used for computer vision. For pedestrian tracking datasets, few have data that specifically include individuals who use assistive devices. Just as the past dwells in our data, so too do power shadows that show existing social hierarchies on the basis of race, gender, ability, and more. Relying on convenient data collection methods by collecting what is most popular and most readily available will reflect existing power structures.
Diving into my study of facial recognition technologies, I could now understand how, despite all the technical progress brought on by the success of deep learning, I found myself coding in whiteface at MIT. The existing gold standards did not represent the full sepia spectrum of humanity. Skewed gold standard benchmark datasets led to a false sense of universal progress based on assessing the performance of facial recognition technologies on only a small segment of humanity. Unaltered data collection methods that rely on public figures inherited power shadows that led to overrepresentation of men and lighter-skinned individuals. To overcome power shadows, we must be aware of them. We must also be intentional in our approach to developing technology that relies on data. The status quo fell far too short. I would need to show new ways of constructing benchmark datasets and more in-depth approaches to analyzing the performance of facial recognition technologies. By showing these limitations, could I push for a new normal?
Excerpted from the book “Unmasking AI: My Mission to Protect What Is Human in a World of Machines,” by Joy Buolamwini. © 2023. Published by Random House, an imprint and division of Penguin Random House LLC. All rights reserved.