Lead article image

Photo via Sam Edwards/GettyImages

Research shows gender bias in Google’s voice recognition

Google can recognize men's voices better.


Selena Larson


Posted on Jul 15, 2016   Updated on May 26, 2021, 10:57 am CDT

Voice recognition technology promises to make our lives easier, letting us control everything from our phones to cars to home appliances. Just talk to our tech, and it works.

As the tech becomes more advanced, there’s another issue that’s not as obvious as a failure to process simple requests: Voice recognition technology doesn’t recognize women’s voices as well as men’s. 

According to research from Rachael Tatman, linguist researcher and National Science Foundation Graduate Research Fellow at the University of Washington, Google’s speech recognition software has gender bias. 

She realized significant differences in the way Google’s speech recognition software auto-captions video on YouTube. It was much more consistent on male voices than female. She said the results were “deeply disturbing.” 

Tatman said she hand-checked more than 1,500 words from annotations across 50 different videos and discovered a glaring bias. 

It’s not that there’s a consistent but small effect size, either, 13% is a pretty big effect. The Cohen’s d was 0.7 which means, in non-math-speak, that if you pick a random man and random woman from my sample, there’s an almost 70% chance the transcriptions will be more accurate for the man. That’s pretty striking.

“Language varies in systematic ways depending on how you’re talking,” Tatman said in an interview. Differences could be based on gender, dialect, and other geographic and physical attributes that factor into how our voices sound. 

To train speech recognition software, developers use large datasets, either recorded on their own, or provided by other linguistic researchers. And sometimes, these datasets don’t include diverse speakers.

“Generally, the people who are doing the training aren’t the people whose voices are in the dataset,” Tatman said. “You’ll take a dataset that’s out there that has a lot of different people’s voices, and it will work well for a large variety of people. I think the people who don’t have socio-linguistic knowledge haven’t thought that the demographic of people speaking would have an effect. I don’t think it’s maliciousness, I just think it wasn’t considered.”

A representative for Google pointed us to a paper published last year that describes how Google built a speech recognizer specifically for children. In the paper, the company notes that speech recognition performed better for females. Additionally, Google said it trains voice recognition across gender, ages, and accents.

It’s hard to directly address the research in the article you shared, since the results reflect a relatively small sample size, without information on how the data was sampled or the number of speakers represented. That said, we recognize that it’s extremely important to have a diverse training set; our speech technology is trained on speakers across genders, ages, and accents. In a research paper we published last year (Liao et al. 2015), we found that our speech recognizer performed better for females, with 10% lower Word Error Rate. (In that paper we actually measured 20% higher Word Error Rate for kids—which we’ve been working hard to improve since then.)  

Speech recognition has struggled to recognize female voices for years. Tatman cites a number of studies in her post, including voice tech working better for men in the medical field, and that it even performs better for young boys than girls. 

When the auto industry began implementing voice commands in more vehicles, women drivers struggled to tell their cars what to do, while their male counterparts had fewer problems getting vehicles to operate properly. 

It’s a similar problem to biases in algorithms that control what we see on the web—women get served ads for lower-paying jobs, criminal records turn up higher in searches for names commonly associated with someone of African-American descent, and the software some law enforcement agencies use is biased against people of color. And this isn’t the first time Google’s automated systems have failed; last year, the company came under fire when its image recognition technology labeled an African-American woman a “gorilla.” 

Tatman said the best first step to address issues in voice tech bias would be to build training sets that are stratified. Equal numbers of genders, different races, socioeconomic statuses, and dialects should be included, she said.

Automated technology is developed by humans, so our human biases can seep into the software and tools we are creating to supposedly to make lives easier. But when systems fail to account for human bias, the results can be unfair and potentially harmful to groups underrepresented in the field in which these systems are built.

Share this article
*First Published: Jul 15, 2016, 3:17 pm CDT