The AI Edge: How to Tackle AI Bias
A new advice column from our AI expert to help you deploy AI. Questions? Email [email protected]
The worldwide AI market is expanding at an accelerating pace, and recent innovations like ChatGPT have reaffirmed its great potential. As a result, businesses are trying to embrace artificial intelligence more than ever for building efficient and innovative solutions for their customers.
However, it is important to be cautious about relying on AI as there is evidence that it can sometime be biased in its decisions, which will have negative consequences on business. This bias can occur when the data fed into the AI system does not encompass all perspectives, or the rules that the AI system follows are skewed.
Stay updated. Subscribe to the AI Business newsletter
This bias in the AI-based systems has been pointed out by many firms and organizations in the past few years. For instance, in 2021, an OpenAI audit found gender and age bias in its CLIP model. Similarly, Obermeyer et al. (2019) reported evidence of racial bias in health care algorithms in the U.S. Furthermore, the New York State Department of Finance acknowledged that there are risks in algorithmic lending, including “inaccuracy in assessing creditworthiness, discriminatory outcomes, and limited transparency.”
Got a question about building, testing and deploying AI models? Email Vishal at [email protected].
Such AI bias can result in discriminatory and unfair outcomes that impact customers, employees, and stakeholders. These outcomes can also negatively affect end users' experiences or create an unequal playing field, leading to dissatisfaction and potential legal issues for businesses, which can damage the overall reputation of the associated organizations.
AI governance policies
It is the need of the hour to introduce AI governance and compliance policies, which include checking the AI outcomes for potential bias.
These policies should establish clear ethical guidelines for developing and deploying AI systems, including ensuring that the data utilized to train the AI systems is tested for diversity on different demographic and socioeconomic groups, ensuring the resultant AI systems are fair and reliable. As for compliance, policies should consider a risk assessment for AI outcomes and implement technical safeguards to mitigate risks.
The engineering groups should also be required to contribute to this goal by including best practices for the development of unbiased and reliable AI using tools including the following:
IBM Watson OpenScale: An enterprise tool to identify and mitigate bias and drift in AI
AI Fairness 360 (AIF360): An open-source toolkit to examine, report, and mitigate discrimination and bias in Machine Learning models
Microsoft’s Fairlearn: An open-source toolkit for assessing and improving fairness in AI
AI systems with potential social impact should have a Bias Index and also a Fairness index that should exceed standard guidelines. For example, the Fairness Score should come in between 0.9 and 1.0 as an indicator of a fair AI system.
Here is a step-by-step process to ensure fairness and accuracy on a model. I am using AIF360 but a similar process can be applied with Fairlearn.
Create a diverse data sample that includes various demographic and socioeconomic groups based on the protected attribute.
Split this dataset into training and testing datasets for model development.
Calculate the accuracy and mean statistical parity difference for the training dataset from your current AI system for all groups.
Apply the reweighing algorithm to the training dataset. Reweighing is a pre-processing method that weights the instances in each (group, label) pair differently to assure fairness before classification.
Train a new ML model using the transformed training dataset.
Test the transformed model using the testing dataset and store the predictions.
Compute the accuracy and transformed dataset metric (mean difference) for the testing dataset. Note: The mean statistical parity may depend on the use case and class imbalance. Also, evaluate if any other scores fit better for the data.
Ensure the Disparate impact (Fairness Score) remains close to 1.0, thus ensuring an unbiased AI system.
Evaluate and compare the fairness metrics for the original and transformed models ensuring the accuracy and fairness of the ML model, as well as balanced accuracy and fairness scores.
By following this high-level approach, you can evaluate the fairness of a machine learning model using AIF360 and compare the performance of the original and transformed models based on various fairness metrics.
Helpful papers
A Comparative Study of Fairness-enhancing Interventions in Machine Learning
About the Author
You May Also Like