Ethics and Bias in Machine Learning

Propheto
4 min readMar 1, 2021

Thanks to machine learning and AI there have been major advancements in areas like search engine results, product recommendations, and autonomous vehicles. There is no denying that AI and machine learning have already transformed our lives and will continue to shape the future in a major way.

While these algorithms are often thought of as black boxes, they are increasingly relied upon for highly sensitive and critical tasks. This lack of understanding and the autonomy given to AI systems can lead to disastrous results.

Machine learning algorithms have the potential to perpetuate and amplify biases that are already present in our society. Amazon developed an HR recruiting AI algorithm that inadvertently filtered out female applicants. Microsoft famously created an AI chatbot Tay, that Twitter trolls quickly trained to become a racist. The Detroit police sent an innocent man to prison due to a flawed facial recognition system. Does this mean that all AI is doomed to be racist and biased?

We can only use machine learning and AI responsibly and ethically if we understand how they are vulnerable to bias and damaging consequences of that bias.

In honor of Black History Month, we wanted to take a closer look at bias in algorithms. In this article, we walk through a hypothetical example to show you how biases can creep into machine learning algorithms unexpectedly and what you can do to correct these problems.

Expanding a regional bank

Imagine you are an executive at a small bank that is regional to the pacific northwest. Currently, your bank has about 50 locations throughout Washington, Oregon, and parts of Idaho. You’ve decided that it is time to expand to the national level and your team is planning the opening of the first location in the south east, specifically the greater Atlanta metro area.

After months of work setting up the location and finding employees, you are ready to launch. One of the key products that your bank offers is mortgage loans and you are excited to offer this to customers in the south east. Your team has spent years collecting data from customers to develop a machine learning model that calculates the riskiness of potential loan applicants. You plan to use your model to evaluate potential loan applications in the new market.

On the surface, this seems like it could lead to the best results. Your machine-learning credit score is data-driven and cost-effective. However, at a closer look, you will quickly find how flawed this approach is.

Remember, machine learning at its core is all about finding patterns in data. This is true for something simple like linear regression or the most advanced AI systems available today. When a machine learning model encounters something that it has not seen before or has not seen very often, then the model tends to behave in unexpected ways.

Our current credit risk model was built based upon the pacific northwest market, which is inherently different from the new south east market. Specifically, while the two regions may look somewhat similar from a demographic standpoint (they have roughly the same populations and age distributions), they differ dramatically in their racial composition. Washington state has a population that is approximately 74% white and 9% black or African heritage. Georgia, on the other hand, has a population that is 58% white and 32% black or African heritage.

Therefore, if our risk model was based on data from Washington state, it’s possible that in the south east market it would more heavily target the white population and offer them lower mortgage rates in comparison to non-white applicants. This is simply because the model was trained on data that was unevenly distributed towards the white population.

Lending based on race is of course highly problematic. So does all this mean that we have to revert back to manually assessing the credit risk? Should we just use one rate across an entire population? Absolutely not. We just need to be smart about how we correct our model to handle these differences in the population (and other differences we may not be capturing).

Here are several options we can use to correct our model:

  • Apply a penalty — Penalized models or adding a weighting parameter to variables that we know are not representative of actual behaviors is a great way to ensure our model performs well on new/unseen data.
  • Resampling — Resampling, also known as bootstrapping, is the process of drawing additional samples from the available data so that our model better handles the under-represented population.
  • Look “under the hood” — Using feature importance techniques to understand what is truly driving predictions can easily highlight what is important in a machine learning model.
  • Buy new data — There are several services available that can be used to help augment your data. You might even be able to purchase data from competitors already in the region.

All of these methods can correct for the potential bias in these models allowing your approach to lending to stay data-driven.

Summary

While this type of example is fairly well-documented, it is important to emphasize that there are fixes. Businesses need to take responsibility for their use of machine learning if they want to contribute to an equitable society. Before any algorithm can be fixed though, you have to really understand the data and ensure that the models that you are creating have the right monitoring and alerts in place to uncover these problems quickly and take corrective actions.

--

--

Propheto

Propheto is the DevOps platform for data science teams. Propheto orchestrates all of the DevOps processes so that data science teams are not burdened with provi