by Jeff Foley
The specter of bias in Artificial Intelligence is appearing more and more often. Whether they’re automatically completing emails, recognizing faces, or assessing criminal risk factors, AI systems can make unexpected recommendations by incorrectly favoring certain results. The debate is on as to whether AI systems should be more transparent in how they formulate recommendations. But transparency in an AI system tackles the problem of identifying biased results. How can AI systems mitigate bias in the first place?
Why AI is susceptible to bias
With so many commercially available AI technologies in the market, it’s easy to get lost in a sea of discussions about neural nets, machine learning, personalization, recommendation engines, natural language, deep learning, virtual agents, and intelligent automation. Regardless of their specific tasks, however, most AI technologies rely on one of two traditional approaches to learning about the world around them:
- Trust the Data. The motto of these technologies is “data will solve everything.” By providing examples -- often hundreds of millions to billions of data points -- secret sauce machine learning algorithms will automagically figure out how to predict future outcomes. Optimistic accuracy curves demonstrate how providing more training data incrementally improves results.
- Trust the Humans. In contrast, these models rely on a “brute force” approach. Experts manually write rules and build up ontologies that will understand how to interpret incoming data. They then look at outputs from test data, and further hand-tweak the rules to get more accurate results.
An AI system trained in this way inherits information, biased or otherwise, from either its trusted data or its trusted humans. It’s more insidious than Garbage In, Garbage Out. It’s the less detectable Bias In, Bias Out. Worse, machine learning doesn’t just preserve those biases, sometimes it can even amplify them. Because machine learning leverages subtle correlations, a system trained on slightly skewed data can produce greatly skewed predictions.
“These AI systems inherently limit themselves by relying exclusively on their training data and their human handlers,” says Dr. Catherine Havasi, co-founder of the Open Mind Common Sense project, and a visiting professor at the MIT Media Lab. “By restricting their knowledge of the world entirely to a domain-specific scenario, they’re missing the general domain ‘common sense’ which we as humans use to contextualize the kinds of problems we solve.”
Ways to combat AI bias
How can humans remove bias, when we as humans have bias ourselves? AI practitioners, especially those using AI for natural language processing, have taken notice of biases such as gender stereotypes, and started offering solutions to promote AI fairness. For instance, one natural language training approach actively prevents learning correlations with gendered words, in order to avoid inheriting gender stereotypes. Another approach uses adversarial learning to teach AI systems to improve predicting one variable, such as income bracket, without getting better at predicting a protected variable like gender or location. The industry will continue to pursue bias mitigation in training data.
But another powerful way to mitigate bias is to not rely exclusively on trusted training data or trusted human supervisors, by introducing a background knowledge base. Then, teams can bootstrap new AI systems with previously created, general-domain data points. Starting off deep learning models with millions of “common sense” facts, instead of starting from nothing, can offset the bias otherwise introduced by a domain-specific training corpus.
Robyn Speer, the Chief Science Officer at natural language startup Luminoso, is very familiar with this approach. She maintains ConceptNet, a knowledge base of word meanings. Using ConceptNet, Luminoso can produce domain-specific semantic models from smaller sets of “trusted data” than deep learning normally requires – and do it a lot faster than “trusted humans” writing rules to maintain the system.
Using background knowledge like ConceptNet has another advantage: it mitigates bias. “We’ve spent a lot of concerted effort on de-biasing our base models,” Speer says. “When our models learn from our general-domain data in conjunction with domain-specific data, they’re more likely to pick up less bias than if they learned from the training corpus alone. And, it means that every project using our base model will benefit from our work.”
Related: How Common Sense Reasoning Can Elevate AI For Enterprises
The value of reducing bias in AI
Is it worth the effort to remove bias from AI? Some companies have already formed groups to monitor ethical issues from decisions made by their AI systems. “Deploying any AI system requires moral choices,” Speer points out. “Choosing to ignore those choices is still making a choice -- and not necessarily a good one.” After all, since training data inherently represents the past, more data will never overcome its bias against changes in the future.
One concern about de-biasing is that it alters data to say something different from what it originally says. What if the data being studied reveals racism or misogyny? It’s important not to algorithmically ignore that. This is another case where using background knowledge can have an advantage over traditional training methods. “Distinguishing between domain-general and domain-specific knowledge gives us a clear separation into the part we should de-bias and the part we shouldn't,” notes Speer. “If there’s a bias in the domain-specific data, it will stand out in the results and be more visible to analysts.”
Notwithstanding the ethical implications, addressing bias in AI can make the system more robust. A system that relies on stereotypes as a crutch for reasoning isn’t just perpetuating harm; it’s also overfitting to the training data and not making the best predictions it can. Removing inappropriate biases gives the AI room to learn about more relevant features of the data. In other words, teaching AI systems a more idealized version of the world is not just the moral thing to do, it’s the right thing to do.
Jeff Foley, the head of marketing at Luminoso, has been working with CX, CRM, and natural language technologies since 1996.