Preliminary remark: This article was first published on medium.com on February 21, 2019.
Reading the excellent report by the Council of Europe on “Discrimination, Artificial Intelligence and Algorithmic Decision-Making”, I wondered to what degree algorithmic decision-making could serve to further exacerbate discrimination in already deeply divided societies.
To my surprise, the report catapulted me back almost 20 years to the time when I wrote my master thesis on “Presidential Systems in African States”. One of the main factors I found to impact whether presidential systems were a good choice or not, were the ‘cleavages’ in a country. In political science, a cleavage is a division of a community into different groups. Typical cleavages that can be found in many societies are religious, linguistic, ethnic, or ideological cleavages. Depending on the society in question, cleavages can be overlapping or cross-cutting.
Overlapping cleavages mean that people who are divided by one cleavage, will also be divided by another — this is the case e.g. if in a country with a Muslim and a Christian population, most Muslims live on the country side, speak one language, and are relatively poor, whereas most Christians live in urban areas, speak another language and are relatively wealthy. In such societies, individuals are predominantly member of one single group which is united by several characteristics — the society is clearly segmented and often deeply divided.
Cross-cutting cleavages by contrast denote constellations where those divided by one cleavage (e.g. language), will be united by another cleavage (e.g. religion). In such societies, individuals are always simultaneously members of several groups, e.g. Muslims and Christians live in rural as well as urban areas, some Muslims speak language x, others speak language y etc.
Now why did I think of this long-forgotten piece of work when reading the report?
A classic case of algorithmic discrimination refers to collecting data on the postal code of users and establishing correlations between the postal code and e.g. loan default rates. As the report says
“The AI system learns that people from postal code F-67075 were likely to default on their loans and uses that correlation to predict defaulting. Hence, the system uses what is at first glance a neutral criterion (postcode) to predict defaulting on loans. But suppose that the postcode correlates with racial origin”.
It was the term ‘racial origin’ that triggered my memories of the cleavages in societies. A society where there is a spatial and simultaneously economic segmentation among people with different racial origins is a classic case of overlapping cleavages with a high potential for conflicts.
In the language of data science, overlapping cleavages seem to promote ’redundant encodings’. Redundant encodings exist when membership in a protected class (such as race, gender, age, religion, marital status) “happens to be encoded in other data. This occurs when a particular piece of data or certain values for that piece of data are highly correlated with membership in specific protected classes” (Barocas and Selbst 2016, p. 691/2).
To name a case, it has been found that
“gender can be inferred from other data factors which are included: for example, if the applicant is a single parent, and 82% of single parents are female, there is a high probability that the applicant is female”.
How could this problem be resolved? Contrary to common intuition, and as acknowledged by Barocas and Selbst, the problem of ‘redundant encodings’ cannot just be resolved by removing the variables from the data mining exercise, because this “often removes criteria that hold demonstrable and justifiable relevance to the decision at hand.” According to them “[t]he only way to ensure that decisions do not systematically disadvantage members of protected classes is to reduce the overall accuracy of all determinations.”
What does all of this mean for the risk of being ‘discriminated through data’? Do overlapping cleavages, or segmented societies, offer more potential for discriminatory algorithmic decision-making practices than societies that are characterized by a multitude of identities? One would assume that unfair inferences are harder to make in societies where any combination of religion, social status, neighborhood etc. is plausible or common.
Or more general: what significance does the structure of a society have for the risk of discriminatory algorithmic decision-making?
Algorithmic decision-making will gain ground, regardless of how carefully we define its boundaries, and there are plenty of opportunities to deploy it for the common good. But no matter what, we must not lose sight of the societal context in which data is collected and in which algorithmic decision-making takes place, and we must pay specific attention to discriminatory segmentations and cleavages. There are different ways of doing this:
- Interdisciplinarity: we need to ensure that data scientists always work together with people from other disciplines, e.g. social scientists or ethicists. Only by bringing together different perspectives, can we avoid disciplinary blindness.
- Technical restraints: another option is to introduce ‘technical fixes’to the ethical problem of ‘unfair inferences’, e.g. by removing the variables in question as mentioned above. In political-philosophical terms this can be compared to adding a Rawlsian veil of ignorance to your data, where we know very little about an individual’s talents, abilities, tastes, social class etc. This method certainly reduces the risk of discrimination but it is of (intentionally, in Rawls’ case) limited use when it comes to predictive power.
- Adjusting regulation: Sandra Wachter, a lawyer and research fellow at the Oxford Internet Institute, recently pointed out that even the GDPR, considered to be a rather strong piece of data protection compared to other legal frameworks, is “failing to protect people with respect to these inferences, which … paint an extremely detailed picture of users’ private lives”. She therefore calls for a ‘right to reasonable inferences’.
Interdisciplinarity, technical restraints as well as continuous legal adjustments are very important, but in the end, these measures are only treating the symptoms of deep-rooted and persistent social divisions that have often evolved over decades.
As an overarching vision we need to keep in mind that if we want AI in general and algorithmic decision-making in particular to flourish and to contribute to the common good rather than promote or exacerbate division, we need to work towards creating societies where all members have genuine freedom and equal opportunities in their choice of lifestyles and identities regardless of their protected characteristics. These are the circumstances that provide the most fruitful ground for the safe and responsible use of new technologies.