Updated: Dec 24, 2020
Asigmo recently hosted a public webinar on fairness and bias in AI. The talk was held by Dr Rania Wazir and was open to commentary from the participants. Dr Wazir is a mathematician, data scientist and human activist and one of Asigmo trainers.
First of all let’s start with a question, what is Fair? The Merriam-Webster dictionary defines Fair as “marked by impartiality and honesty : free from self-interest, prejudice, or favoritism”. This talk covers what fair means when it comes to AI. Let’s consider the impact of AI on the environment. Training a single AI model can emit as much carbon as five cars in their lifetime. Beside environmental impact, algorithms that are used in optimizing labor demand in restaurants and cafes are treating humans as objects. Another example of misusing AI is an App called Deepfake which has an image recognition algorithm that would take a photo of a person and undress her.
The adoption of AI and ML simultaneously could endanger certain race and ethnicity in our society. The previously upheld notion that AI is infallible has proved to be far from the case. With governments and organisations discovering the discrepancies present within different social groups. AI ethics attempts to solve these issues and provide a solid ethical framework that might prove to be better in the future. However, the main question presented at the talk was ‘what is fair’? Better yet, ‘what is unfair.’ There has been several guidelines implemented by several international organizations such as OECD, UNESCO, UNICEF, EU and even corporates such as Google and Microsoft.
How could we measure fairness of an algorithm? When we talk about fair, it will be quite broad and wage, so let us have a look at unfair. Unfair could mean biased. Statistics could help us in measuring how unfair an algorithm is. There are many ways to be biased when it comes to AI. So how could we detect and measure all this bias?
Furthermore, The Atlantic declared ‘‘The Internet is enabling a new kind of poorly paid hell’’ The people processing the data should be considered. Workers in parts of America complete mundane tasks for long periods of time mostly earning pennies per job. In many parts of the country where jobs have either been outsourced or been eliminated by technology. This work comes from websites such as Clickworker and Amazon’s Mechanical Turk, just to name a few.
Dr Wazir also talked about the methods used by scholars declaring that the term fair is too broad and vague and perhaps a look at what was unfair would derive a better conversation. How does an organisation go about detecting such bias? For starters, check the quality of your data. However, such is limited when dealing with trillion gigabytes of data. There might be many causes such as representational bias or sampling bias. Companies in the past have tried to look at it from the point of view of high risk candidates and low risk candidates or a privilege group or unprotected group. The two main models proposed are as such:
predictive parity: The proportion of correctly predicted high risks is the same regardless of demographic. (All groups have equal PPV)
Equalized odds: within each true risk category, the percentage of false predictions is equal in each demographic. (All groups have equal FNR and FPR)
What happens when the prevalence of high risk is greater for one group than another? What definition of fair are we going to use? What happens when your historical data classifies women as a high risk group even though there are more women. Dr wazir states you cannot have it both ways. You can either have equalised odds or predicted parity.
The main topic on everyone’s mind was the case of Northpoint. Northpoint created a model to determine which inmate would receive more jail time based on whether they would reoffend. They claimed their algorithm was fair because within each risk category, the proportion of defendants who reoffend was approximately the same regardless of race. This means PPV is equal among all categories. The study found that 60% of white defendants reoffended whilst 61% of black defendants re-offended. Additionally, they cited that race had not been a deciding factor in their model. However, Propublica found this to not be the case. Amongst defendants who ultimately did not reoffend, black were more than twice as likely as whites to be classified as medium to high risk. 42% vs 22%, meaning more black inmates were given harsher sentences. So False Negative Rate FNR was almost double for blacks compared to whites. The problem is that overall recidivism rate for black defendants is higher than white defendants (52% vs. 39% respectively). This is a systematic bias. Several measures could be taken into consideration in order to assess fairness of a model. Here is a list of these measures:
Conditional statistical parity
False positive error rate balance
False negative error rate balance
Conditional use accuracy equality
Overall accuracy equality
Fairness through unawareness
Dr Wazir then posed crucial questions such as not allowing algorithms for such important decisions, as well as citing the need for the use of multiple algorithms to minimise the risk of mistakes, and lastly, do companies have the right to remain private especially when building models that formulate large decisions? Many tools and frameworks have been developed to tackle ethical concerns and negate potential harmful effects of AI. Who bears the responsibility? What mechanisms should be put in place to monitor these progressions?