“The potential for harm when algorithms take over bureaucracies and policies is unbounded”. Interview with Cathy O’Neil

Santiago Liaudat; Lucía Céspedes

Entrevistas

Ciencia, Tecnología y Política

Universidad Nacional de La Plata, Argentina

ISSN: 2618-2483

ISSN-e: 2618-3188

Periodicity: Semestral

vol. 7, no. 13, 2024

revista.ctyp@presi.unlp.edu.ar

Received: 05 June 2024

Accepted: 24 September 2024

URL: https://portal.amelica.org/ameli/journal/214/2145144014/

DOI: https://doi.org/10.24215/26183188e120

Abstract: Interview with Catherine (“Cathy”) Helen O'Neil, born in the United States, known worldwide for her critical studies on the negative effects of algorithms. PhD in Mathematics from Harvard University, she is the author of the books Doing Data Science (2013), Weapons of Math Destruction (2016) and The Shame Machine (2022). After working for the financial sector as a data scientist, she launched ORCAA, an algorithmic auditing firm. She is a regular contributor to Bloomberg Opinion news agency, author of the blog and member of the Public Interest Tech Lab at the Harvard Kennedy School.

Keywords: artificial intelligence, algorithmic bias, discrimination, regulations, ethics.

Resumen: Entrevista con Catherine (“Cathy”) Helen O’Neil, nacida en los Estados Unidos, conocida mundialmente por sus estudios críticos sobre los efectos negativos de los algoritmos. Doctora en Matemática por la Universidad de Harvard, es autora de los libros Haciendo Ciencia de Datos (2013), Armas de Destrucción Matemática (2016) y La máquina de la vergüenza (2022). Luego de trabajar para el sector financiero como científica de datos, puso en marcha ORCAA, una empresa de auditoría algorítmica. Es colaboradora habitual de la agencia de noticias Bloomberg Opinion, autora del blog y miembro del Laboratorio Tecnológico de Interés Público de la Escuela de Gobierno John F. Kennedy de la Universidad de Harvard.

Palabras clave: inteligencia artificial, sesgos algorítmicos, discriminación, regulaciones, ética.

Resumo: Entrevista com Catherine (“Cathy”) Helen O'Neil, nascida nos EUA, conhecida mundialmente por seus estudos críticos sobre os efeitos negativos dos algoritmos. Com doutorado em Matemática pela Universidade de Harvard, ela é autora dos livros Doing Data Science (2013), Weapons of Mathematical Destruction (2016) e The Shame Machine (2022). Depois de trabalhar no setor financeiro como cientista de dados, ela fundou a ORCAA, uma empresa de auditoria algorítmica. É colaboradora regular da agência de notícias Bloomberg Opinion, autora do blog e membro do Laboratório de Tecnologia de Interesse Público da Escola de Governo John F. Kennedy da Universidade de Harvard.

Palavras-chave: inteligência artificial, vieses algorítmicos, discriminação, regulamentações, ética.

Criticism of the expansion of digital platforms has been mostly directed at the loss of privacy. But in your book Weapons of Math Destruction you went further by stating that algorithms used for decision making deepen social inequalities. What traits make a mathematical model a weapon of mass destruction?

A weapon of math destruction is an algorithm that’s used by a lot of people for important decisions - whether that's about their job or their finances or their freedoms or their information. And it's a secret. People don't understand it. They often don't even know they're being scored. It's almost always a scoring system, you can think of it as a scoring system, maybe multiple scores in multiple dimensions of scoring. And then it's also unfair. No algorithm is perfect, so it's always going to be unfair to someone, especially if it's making decisions about who deserves one - which it often is. But it's not just unfair arbitrarily and idiosyncratically. It’s unfair systematically, so it goes against poor people against marginalized communities, etcetera, in the standard ways. The basic premise of data science is to find out who has money and give them opportunities, and find out who doesn't have money and prey upon them. That's the way the internet works as a business model, if you think of the internet itself as a business model. That's what they're doing. Privacy concerns have always bothered me. Not that I don't think we deserve privacy, but that it's so beyond true that we don't have privacy. People don’t quite understand how bad it is because you could be extremely good at keeping yourself private, but the moment that you have to interact with a bureaucracy you have to give your Social Security, you have to give your address, you have to give the information they ask for, and they will ask for the information that allows them to profile you based on everyone else who isn't careful. That's an important point that I think people who are interested in privacy miss - that privacy only lasts until you have to interact with the machine, and then it's gone. And the idea that you never have to interact with a bureaucracy in your entire life is just silly. So there's really no protection for anyone when it's working like this.

What are some of the inequalities that algorithms produce and reproduce? Do they, in any way, perpetuate a colonialist gaze on Global South countries?

I would just say, at the very high level, my job as a data scientist was to make lucky people luckier and unlucky people unluckier. Of course algorithms do a little bit more than that, but definitely the strongest signal is wealth. After that, it’s gender, and after that, it’s race. But wealth is the easiest thing to find. I'm not an expert on the colonial gaze, but I would just say that absolutely, because it is completely focused on what Elon Musk thinks. It's focused on white people, it’s focused on Western people, it’s focused on Americans. That is the perspective of ChatGPT and all the products that are coming out of there. So, to the extent that everybody in the world is ingesting the content that is built in California, absolutely.

Do you think they affect democracy as well? Is there any link between the model of society underlying these weapons of math destruction and the rise of a new right wing in the West?

I definitely think so. The subtitle of my book is How big data increases inequality and prevents democracy. So that's my assumption. I wrote that book in 2014, being a member of Occupy at the time.¹ I was able to shove a couple sentences in 2016 as the election was underway. For example, I managed to cram in something about Cambridge Analytica² and about how politicians no longer have to tell the same thing to everyone: they can tell different messages to different people. All of that, of course, is true, but what I really didn't anticipate - which is somehow even worse to me - is that politicians don’t even have to have information, they no longer have to have platforms, they only need to have emotional manipulation. That's kind of the message, that’s their platform, scare you in this way. Immigration, for example. Just make you feel afraid. That's not a sort of historically classic platform to go on, but it ends up being what the messages actually look like when you're talking about micro-targeting on the internet. It's so tailored, it is so anti-democratic. The messages are often actually anti-democratic because they're trying to undermine people's trust in elections. The Facebook advertising ecosystem, which is built on algorithms, is like a perfect anti-democratic propaganda machine.

Stiftelsen - Internetdagarna 2017
WikiMedia Commons

Has the rapid development of Artificial Intelligence (AI) in recent years aggravated the picture you described in 2016?

I really can't believe the hype around generative AI, which just seems super overblown. For me, all of these things aren't inherently evil until they're used for evil purposes. Sometimes I like to think about a scenario where an evil algorithm could be used for good and I can do it, I can always do it. For example, with risk algorithms³, which I think are the most evil ones that I learned about. Sentencing people to longer, because people like them got rearrested in the past. We could use those algorithms to sort of shine a mirror on us, us the public, and ask why do poor people go to prison so much more, why do people without mental health treatment cycle through prison, why are we doing this to people. The thing that I wrote the book for is a pitch to the public to stop blindly trusting algorithms and big data - as it was called then, now it's AI. It's a new marketing term but the blind trust is once again what I worry about. I don't worry about these tools being very good, because they're not. I actually want to write an essay about this because I want to make it really crystal clear that it’s deliberate. It's a deliberate design of these rich white guys in Silicon Valley to get people to trust the chat box. In fact I was reading about Rhetoric and Ancient Greece and how Greeks thought about persuasive arguments, and there's a six-step process and one of them is admitting your mistakes. It just hit me, like when you tell ChatGPT that it made a mistake, it's always like “oh I'm so sorry, that was a mistake”. It apologizes, which gives it credibility. But I will also mention - because I did this as a test - if you correct it in something it got right, it also apologizes! You know what I mean? But we don't do that because we're humans and we're not suspicious, we're in a trusting mood, so we only correct mistakes. We don't correct things that are correct. My point is that it's deliberately designed to have us trust it. For me, if I have a one sentence description of what I'm trying to do, it's I'm attempting to keep people from trusting the machine and that's what really gets under my skin about the current AI hype, that it's a deliberate attempt to get people to trust it, even though it's still terrible.

Are the problems identified in relation to biases and discrimination on algorithmic grounds intrinsic to the algorithms, or merely design issues? Are the science and technology involved a neutral instrument whose effects depend on who uses them, or do they need to be redesigned?

It's a little bit of both, to be honest, depending how you think about it. It also depends on what you mean by “tool”. The only company in the world that will ever use the Facebook Newsfeed algorithm is Facebook, right? And they built it to keep people captivated and staying on Facebook. It's optimized for engagement. Engagement is their choice because it's about profit, so it's literally a capitalistic tool. It's hard to imagine a civil society group saying “oh, we could use that tool for good”, because they can't, because it's built to optimize profit and to be ignorant of any kind of harm. It is built that way. Nobody would want that, except Facebook. But if you think of the tool at the level of algorithms themselves, or information feeds, then it's easy to imagine civil society groups who are like “I think we could build a much better information feed that would actually supply people with useful, good and true information”. So if you think about that, you're like “oh, of course it could be done much much better”. I guess the short answer is, because these algorithmic systems are massive and hard to maintain and extremely expensive, only very well funded highly technical sophisticated groups of people will ever really own them. Until that changes, that means it's going to be basically large for-profit companies that are going to be using them to make more money. So it's a tool of capitalists right now to be capitalistic. That's not an imperative in principle, but it is empirically what's happening.

Can the free market be self-regulating in this regard or should State regulations be established?

I would never say that we could trust a free market to solve any of these problems. The point is that I think algorithms are replacing every bureaucracy in terms of who deserves a job, who deserves a credit card, who deserves prison. Every single bureaucratic decision-making process is becoming an algorithm, if it hasn't already. You can even argue that policies themselves are becoming algorithms. Policies, political choices are being “algorithmitized”. That's the way I look at it. That means that these bureaucratic decision-making processes are being owned by capitalists to decide people's fates in important ways. Obviously we have an interest as a public to make sure that that's being done fairly, and it's absolutely required that we do that. Right now we have a bunch of anti discrimination laws in the US that are just being ignored because they don't know how to apply it to an algorithm, and that's why I started my company and I've been working on that for years and years. There is progress, it's slow, but there’s progress in the realm of hiring, there’s progress in the realm of insurance. But the potential for harm when we're talking about algorithms taking over bureaucracies and policies is kind of unbounded, so of course we have to impose rules about keeping track of that harm and mitigating it.

What is the place of private auditing and accountability initiatives in the governance of algorithmic systems? What is ORCAA’s⁴experience in this regard?

It was very eye-opening. I'm a little bit more cynical than I was. Progress is slow. Most companies just don't want to be audited. They're not going to volunteer for it. They want to maintain plausible deniability of the harms that their algorithms are perpetrating. So the question is, how do we get leverage, how do we get to the point where companies that we actually want to audit need an audit. Or let's put it this way, getting an audit will be less expensive than not getting it. That's really all we are looking for. Now we're working for a bunch of enforcement agencies like the Attorney General and the Federal Trade Commision and other federal agencies and insurance commissioners. But we're also working with class action law firms that are suing on behalf of large classes of people and that's going to get us leverage, that's how you do it in the United States. You need leverage either from regulatory pressure or from litigation risk because right now, most companies just don't think that they'll ever get in trouble for the harm that their AI is doing. They don't like to pay for Human Resources (HR) so they just fire all their HR people, replace them with an algorithm that's racist and sexist and ageist, and they're like “yeah, nobody gets in trouble for that, so we win”.

How could ethical criteria be included in the design of an algorithm? What do you mean when you say that “AI Ethics cannot be automated”?

Algorithms are just codified bureaucracies. That's why they should be tested, they should be very deliberately designed with that in mind. They have to balance the rights of different stakeholders and the harms of the different stakeholders groups. One group might care about false negatives and then another group care about false positives, and another group might care about something else entirely. And you have to sort of manage all of those needs, or at least acknowledge them in your design. That's the very first framework that we use, called the Ethical Matrix. We consider all the stakeholders and all the concerns, for example, what could go wrong for these people or even the environment, that could be a stakeholder. The reason I say it can't be automated is that this is a very context-specific question. You're building an algorithm for a specific context that will define the stakeholders and their concerns. Just to give you an example, let's say you're talking about facial recognition. It's being used during the day at a place of business to make sure that the person walking in the building works there, and if it doesn't recognize the face, they have to go talk to a security guard and be let in. That’s one context, versus, it’s being used by the police department with video stills from a video of a robbery, and then people get arrested. Okay, a mistake being done in the first case is pretty low stakes, a mistake leads to a nuisance for somebody. A mistake in the second context could lead to somebody being in jail for two weeks, which is a big deal. The point is that you can't audit “facial recognition”, you can only audit a particular use case of facial recognition. So that's what I mean when I say you can't automate it.

In your latest book, The Shame Machine, you make a social and political reading of this feeling, showing that there is a “shame industrial complex”. Could you expand on this concept and approach?

Shame is really important and useful and at the same time is really over exploited, especially at punching down vulnerable people. It's also not new at all. It's really an old thing that's been used for centuries in the context of commerce. If you think about for-profit shame, if you think about skin lotion to get rid of wrinkles, you realize this is nothing new. So what's new about it? Well, similar to what we are saying about microtargeting in politics. The ecosystem of advertising allows advertisers to micro-target and shame people so precisely that it has become an enormously targeted and brutal system, and it has grown massively. The for-profit model for shame is, first, make sure someone feels ashamed about something, and then sell them a product that doesn't actually solve that problem, so you have an ongoing customer. There is an article that really got me into the whole topic, as well as my own bariatric surgery research.⁵It was the fact that these razor companies wanted to sell shaving equipment to Asian women, but Asian women didn't feel bad about having hair on their legs. So they first had a campaign to make Asian women feel ashamed of their hair, and then they sold them razors. And it totally worked. That's how it works. You’ve got to make sure people feel ashamed.

What examples can you give of this industrial use of shame for humiliation? Who profits from “punching down” on vulnerable people?

The big tech, social media companies, because they're the ones who have trained us quite deliberately, without us really noticing, to do the shaming for them, onto each other. I like the example of the OMV! product that was directed towards teenage girls to make them feel ashamed of the way their vaginas smell.⁶ Then, sell them a product that will not help - whether that's a diet product or a skincare product or anything about any bodily thing to be honest. It’s just by design not supposed to help, but it's supposed to alleviate your shame temporarily. Until it doesn't. As I said, not new, but unbelievably well orchestrated on the internet. Also now shaming boys for their bodies. It’s now a big deal to start shaming young people, because they're going to be lifetime customer service of these products.

Has this way of humiliating the vulnerable been accentuated by algorithm-based social networks?

The other thing that my book’s really about is how it's not just these companies anymore: it's us. We have been co-opted into that whole shaming system by social media. If skincare companies are the old guard of for-profit shame, then social media companies are the new guard. They're not deliberately shaming us and then making us buy something from them, but rather they're setting up a platform for us to shame each other and they're making money off of that interaction, that fight that we have online every day. I think of that as a direct result of how profitable shame is.

In your book you also point out a link between pseudoscience and the shame industrial complex. Could you tell us what the relationship is?

I've been thinking about that recently, it's a theme that comes up with me over and over again. I think it's because I'm a mathematician by training, and I just don't have the reaction that is so common and so exploited, which is to feel like “oh, this is science so I have to trust it”. We talked about blind trust with AI earlier. That's my fight, but what I've realized is that the fight I have over and over again, which is always pseudoscience in one form or another, is the fact that people get kind of awed by pseudoscientific jargon or pseudomathematical arguments and I don't. I'm in this lucky position where I am like “why are you trusting this?”. But as soon as people trust something, then they concede authority to that thing. That's a perfect setup for shame, because ultimately if somebody who doesn't have any authority over you tries to shame you, you're like “please, get away from me”. But if somebody with authority shames you, then you're like “oh, that hurts”. So it's important to gain authority if you want to shame others and you want to sell that product. That's why I think there's such a strong connection between pseudoscience and the shame industrial complex. It needs to be asserted on authority, a certain kind of expertise. Then there's another one which is to appeal to someone's emotions.

The shame machine produces winners and losers, and blames the latter for their failures. This can be read as a way of shifting responsibility for social problems from institutions to individuals. Is the shame industrial complex an instrument of neoliberalism?

Totally. I think one of the best examples of that might be retirement savings. We had systems of retirement savings, fifty years ago, that were pretty good, for middle class white people anyway, where you had pensions, especially if you're in a union. Social security was pretty good too. We've been cutting social security and neoliberals just swept pensions off the table. Almost nobody has a pension now and everybody's expected to save for retirement. But nobody knows how long they're going to be retired because nobody knows when they're going to die. So everybody has to theoretically save an infinite amount of money, way more money than they'll actually probably need, so it actually is very inefficient economically. But it got rid of a systematically agreed upon public issue, and it turned into individual responsibility. And it's not working. People do not have any retirement money, it just isn't working. But instead of acknowledging it was a bad idea to get rid of pensions and to cut social security, the word on the street is that those people should be ashamed of themselves for not saving because “what were they doing? They were so careless”. So it's a great example of how you start with an acknowledged social dilemma, then you individualize it, and then you blame people for not solving that individually and then they just have to suffer.

As a data expert and international reference in the critique of the misuse of algorithms and digital technologies, what future do you envision as possible and what actions can we take to build a desirable future under the current technological order?

I have a pessimistic and an optimistic take on that. My pessimistic take kind of overwhelmed me last week when I googled something and instead of getting links I got a bad answer that didn't make any sense, and then I was imagining a future in that moment where that's all we can do. We’ve thrown away our books, we only have the internet, and if we try to do background research on our questions we just get more AI answers. You're just literally getting basically the pre-digested inaccurate sound bites of people like Reddit⁷ users and it's all the way down, there's no longer a way to actually verify facts. That's a vision. Then the question becomes how many people would actually object to that and what would be our power to stop that, especially considering how dysfunctional Congress is to change anything, especially things that they benefit from in certain ways. So that's a very pessimistic vision where we just basically are being spoon-fed whatever the tech companies decide is good enough, in their opinion, for us. But optimistically, because I don't think that's really that likely. I think, optimistically, there are hype cycles. They're these blind trust waves. I think this one is going to abate because it is just clearly not very smart and has no model for truth and cannot be considered wise at all. The question then becomes, can we assert skepticism onto these systems rather than just have to accept them? To that end, the other optimistic thing I feel is that even during the Trump administration - which obviously had no love for me and my work - I managed to get a lot done with individuals. When I say “a lot done” I mean proofs of concept in the company in the context of insurance or whatever, saying “hey, look at this analysis of car insurance and how it's charging black drivers way more than white drivers”, “here's an analysis and here's how we fix this, and here's how we abide by the anti discrimination law”. It's possible to do it. It's possible to take an ethical sentence in plain English and translate it into a rule in code, and to keep track of it and just say “this algorithm is following this rule that we care about” or “this algorithm isn't following, this is failing by this much”. There is a way to address it in any particular context and once you have enough examples of that, it'll ideally become known that you can actually make sure that algorithms aren't just destructive, that they're working for people, not just for the people that own them. I don’t know how likely that is, either, I think something in the middle.

Do you know about any regulations in the world that are being applied for a more ethical, more just use of algorithms?

The most forward thinking thing I've heard about is the European Union’s AI Act⁸. It hasn't been turned into effective policy yet. It’s just principle based right now, it has to be translated into what it actually means, but it is doing a bunch of things correctly. For example, it sorts algorithms by how important they are, how high impact they are, and how potentially harmful they are in particular use cases. So it cares about context, it cares about outcomes on humans, and it has much higher scrutiny for high impact, potentially harmful algorithms. So that's the best thing I've seen. Of course I talk to policy makers all the time in the United States and some of the stuff that's coming down is good. The question is: is it going to be passed? And then the second question is: is it going to be enforced? But it could be. It’s certainly a lot more involved, a lot more developed. I will say this: when I wrote that book in 2014, I couldn't find anyone to talk to about it. Nobody. And now there's conferences and I talk to people all the time whose job it is to think about policy and AI. So it's pretty high on the agenda and that's exciting. But we haven't actually seen good stuff come out yet.

You mentioned that this is part of the agenda being discussed among policy makers. What do you believe should be the role of scientists in this debate?

It's really hard. I don't know. I'm not an academic anymore. But if I wanted to say something kind of glib, I would say the most important thing is to stop acting like STEM [Science, Technology, Engineering, Mathematics] is the only thing that matters and start emphasizing liberal arts and poetry. That's one of our problems, that we elevate science and math and technology as if it's more important than love. It’s just ridiculous.

Notes

1 Occupy Wall Street was a left-wing movement against economic inequality, corporate greed, big finance, and the influence of money in politics that began in 2011 in Zuccotti Park, located in New York City's Financial District. and expanded to other major cities in the United States and worldwide.

2 Cambridge Analytica was a British political consulting firm that used data collected from millions of users, without their consent, through a Facebook app, to provide analytical assistance to the presidential campaigns of Ted Cruz and Donald Trump in the United States, Mauricio Macri in Argentina, among other political figures.

3 Risk algorithms make predictions about a person's criminal risk from the data and statistics fed into the system. The methodology used by the software to predict risk is a trade secret, and the machine is only required to report its estimates of recidivism to the judge.

4 O'Neil Risk Consulting & Algorithmic Auditing (ORCAA), a company founded in 2016 by Cathy O'Neil, which provides auditing, consulting and training on algorithmic systems and artificial intelligence. See: https://orcaarisk.com/

5 Bariatric surgery is the set of surgical procedures used to treat obesity, aiming at reducing body weight.

6 OMV! is a commercial product of the U.S. company Vagisil, launched in 2021, heavily questioned by gynecologists and women’s health experts. See https://www.nytimes.com/2021/02/18/well/vagisil-omv-teens.html

7 Reddit is a social bookmarking and news aggregator website where users can add text, images, videos or links. Users can upvote or downvote content, making them appear in featured posts.

8 See: https://artificialintelligenceact.eu