Gradient Institute Fairness Demo
Gradient Institute Fairness Demo
An Automated Decision System
You've been tasked by a large hypothetical financial organisation to implement a scalable automated screening process to accept or reject loan applications.Your AI system will decide whether to accept or reject loan applications based on the applicant's ability to repay. But ability to repay is not something you can directly observe in the data. Instead, you could use machine learning to predict a risk score using attributes that you do observe in the application process, such as the applicant's income, and the size of the loan.
At the time when we decide whether to accept or reject a customer, we don't know for certain whether an applicant will repay or default. We can only make a decision based on their score. To make the fewest mistakes, we line up all the applicants from lowest score (left) to highest score (right). We then pick a cut-off threshold, and grant a loan to everybody right of a threshold, and reject everybody left of the threshold.
We've generated a synthetic dataset for this particular scenario below. Try out this decision rule by clicking on the last person you accept (rejecting everybody to their left and selecting everybody to their right):
A binary classifier such as this has four possible outcomes shown in the confusion matrix below:
Accepted | Rejected | |
Repay |
True Positives |
False Negatives |
Default |
False positives |
True Negatives |
Note that we use colour (red or green) to show whether the applicant can repay, and we use saturation (bold or faded) to show whether the applicant was accepted or rejected by the system. Try adjusting the threshold to minimise errors and you will discover a dilemma. To accept more green people, you have to also accept more red. As we tune the threshold, we can adjust the balance between mistakes where we grant a loan to somebody who will default (a false positive), or reject an application for somebody who can repay (a false negative).
Business Objective: Profitability
Let's introduce a simple model for the consequences of our actions:- { } Customers who are given a loan and repay generate a profit of $2800.
- { } Customers who are given a loan and default generate a loss of $9000.
- { } Customers who are not given a loan incur no profit or loss (only missed opportunities).
Can you find a threshold that maximises profit by clicking the above population? You will need to balance the benefit of true positives against the cost of false positives.
Or from the point of view of a stakeholder (who you can click to reveal the solution):
Group Fairness
The organisation would also like to ensure that the system's outcomes don't systemically disadvantage groups of people. An AI system could easily make errors on a group if the data about them is poor, or they are under-represented in the training data. It might also perpetuate or amplify historical disadvantage, if this is captured in the data. Typically, we might consider groups identified in anti discrimination law such as groups based on age, gender, race or sexuality.Suppose the stakeholders have identified a protected group that were historically disadvantaged. We won't name a particular group here, but simply call them the "disadvantaged group", and will distinguish them from other customers using a special "protected" icon:
- people without a shield belong to the general population.
- people with a shield belong to a potentially disadvantaged group.
If this group turns out to be (unintentionally) disadvantaged, then we say the system exhibits "indirect discrimination". Our goal, in the rest of this demo, will be to measure and mitigate disadvantage to this disadvantaged group.
Fairness Objective: Equality of Opportunity
One potential harm of the system is if it causes capable borrowers to miss out on loans. This is visualised by extracting all the green icons from the population. Things might seem unfair if one group is selected at a lower rate than another. The Equal opportunity fairness measure compares the rate that eligible applicants are selected amongst our two groups.Explore the selection threshold to try to obtain equitable outcomes between the groups.
Controlling for fairness
By setting a single threshold for everybody above, you were making a "blind decision" because it was not considering an individual's circumstances. Under the hood of this simulation there actually no difference in ability to pay between the groups. The inequality we are seeing is resulting from our inability to distinguish red from green customers in the data. In this setting we can be confident that the observed inequality arises from our model and data. As you can see, a blind decision does not lead to equitable outcomes (unless you select everybody, or nobody, which aren't useful policies from a business perspective). One (of many) ways to actually control for group fairness is to pick a different threshold for each group. Below we have split the thresholds of the group from the group. Try using the split controls to adjust the cut-offs of each group independently until you achieve equal opportunity.When we define a single notion of fairness like this we have two objectives - a business objective and a fairness objective. They are somewhat at odds, but we can think about concepts like the price of fairness, or finding the most efficient fair system. But depending on the context, there are many other notions of fairness that we could also consider.
Fairness Objective: False Discovery Parity
False discovery parity is an alternative notion of fairness, in this scenario concerned with the ethics of responsible lending. Instead of focusing on those who could repay, let's now isolate the cohort that we selected in the above list ( ) and within these selected individuals examine whether the default rate is the same for each group (i.e. comparing the rate of vs ). We now have a clear trade-off between not only profit and equal opportunity, but even between the two fairness metrics.You will discover that you can't have both types of fairness simultaneously. What is really going on here is that false discovery parity requires us to be more careful (exclusive) and the equal opportunity requires us to be more inclusive. In general we will have any number of objectives (fairness or otherwise) that will need to be balanced.
But that doesn't mean that we should give up. It is important to realise that there is no "right answer" here. Instead, we need to expose this tradeoff to the stakeholders to make a value judgement about how important each objective is. We can then balance the trade-off. Lets assume that we want maximum profit for a given fairness level - the profit optimisation will be automatic. Use the slider below to tip the balance between equal opportunity fairness and false discovery parity fairness.
Mapping the trade-offs
In this simplified scenario, the decision is fundamentally two dimensional because we control two aspects of the decision policy: the acceptance rate of the protected individuals, and the acceptance rate of the general population.Until now we have been looking at the outcomes in terms of three objectives (profit, equal opportunity and default rate) independently. While this helps us understand the impact of a policy decision, it doesn't convey the relationship between them.
Below, we map the decision space on a two dimensional plot, where any combination of thresholds becomes a point on the map. The position of the point on the X-axis is the threshold for the general group, and the position on the Y-axis is the threshold for the protected group. Try dragging the black dot around on the map to see the effect on the metrics above.