The Institute for Digital Public Infrastructure

@ UMass Amherst

Reimagining the Internet

Julia Angwin, The Markup

February 17, 2021

Julia Angwin, co-founder and Editor-in-Chief of The Markup, joins us to talk about her innovative method for investigating Facebook and holding it accountable: paying Facebook users to show her team what they’re seeing. This is a thrilling interview about what the future of data journalism looks like, and just how weird it is that investigative journalists are doing the work that regulators would do in any other industry.

Transcript

Ethan Zuckerman:

Welcome everybody back to Reimagining the Internet. I'm your host, Ethan Zuckerman. We are here today with Julia Angwin, one of my favorite people in the world. An incredible investigative journalist. She's worked with the Wall Street Journal. She worked for quite some time with ProPublica. She is co-founder of a wonderful project called The Markup. We're going to talk about how investigative journalism can help us understand what's going on with social media and what might be wrong with it. Julia, thanks so much for joining us.

Julia Angwin:

Great to be here. Thanks for having me.

Ethan Zuckerman:

What's The Markup? And why is it so unusual?

Julia Angwin:

The Markup is this nonprofit newsroom that has been up and running, we've been publishing for almost a year now and we cover the impact of technology on society. The idea of how technology is used to unfortunately cement inequities and to force some people to algorithmic systems and others not. And we also look at social media and how it amplifies certain voices and some not. And we do all this with using technology. Half of the newsroom is programmers and half are more traditional journalists. And so we partner the traditional journalism with the data driven journalism and we feel like that can super power our investigations.

Ethan Zuckerman:

Give me a sense for how that's different from some of the work you were doing with ProPublica. While you were over with ProPublica, you did some really groundbreaking work doing often some very technical analysis, some really terrific work on systems that were being used in the criminal justice system in Florida that were systemically recommending people of color as having a much higher risk of jumping bail than white people. You've been at this for a while. What's unusual about doing it at The Markup rather than doing it through some of the other institutions you've worked with?

Julia Angwin:

Well, I've been trying to do this, I've worked for a decade now, but the challenge has been both at the Wall Street Journal and at ProPublica, which by the way, are both great newsrooms and well resourced newsrooms, is that there's truly not enough data reporters go around. Essentially they're in a big newsroom like that, might be 10 data reporters for hundreds of other reporters to use as their partners. And so it becomes a little bit of a DMV situation, you have to take a number, wait in line. And I was always sort of in trouble a little bit for overusing the data resources. And so I would often have to go hire contract programmers to work with me because I believe strongly that we as journalists, we're outgunned, out spent and we have to use every possible tool in our arsenal.

And of course those tools tend to be human sources. Interviewing real people, documents, public records, but automation and computation can be one of those tools. And I really believe that we need to use all of those things in order to get the truth. And so the computation automation piece can be very scarce in these newsrooms. And so I wanted to build a newsroom where that was baked into the fabric from the beginning. One thing about The Markup is the data reporters don't report to a different data editor. They are part of the investigative team. They come up with investigative story ideas. They're true partners and it's not a service desk that's over strapped and under resourced, like I've experienced in other places.

Ethan Zuckerman:

The data reporters are the center of the equation. And then they're paired up with someone who has programming skills, has data munging skills. What are the sort of job descriptions for those two partners within a relationship?

Julia Angwin:

We have data journalists who have an array of data skills, because actually there's a whole bunch of things that fall into the bucket data. We have everything from one Leon Yin who was a data scientist before coming to us and had spent no time in a newsroom. And then we have Emmanuel Martinez who is a long time data journalist, already has a Pulitzer finalist from his work at Reveal and they have very different skills. We do some people are really good at scraping. Some people are really good at data analysis and statistics. Some are really good at just traditional public records requests and compiling those into data sets. Our data journalists have a lot of different skills.

Then on the human reporting side, which I don't know a better way to describe it, is the traditional skills, which is going out, finding people, getting them to talk to you on the phone. Those are actually super hard. And it's worth pointing out that the data reporters, the price point for them is higher because the market values them. But the truth is that the work by the human reporters is actually just as hard and sometimes harder. And particularly in a pandemic when you can't go out and knock on people's doors and at least you can, but you won't get as nice a reception. Not that we ever got such great reception as journalists, but we partner those two skills together.

And I think one thing that's really important is that we believe that you can't analyze data in a vacuum. You can't just walk in and say, "I want a dataset about X," and know nothing about X. A whole bunch of work happens before we collect data where we develop what we call domain expertise. The story you were mentioning about criminal risk scores that I did at ProPublica, we spent six months understanding and reporting about risk scores. What were the challenges? What were the issues before we went and collected data? Because you have to know what question you're trying to ask and what data you need to answer it before you even start data collection.

Ethan Zuckerman:

Give me a sense for the sort of stories that The Markup is telling what this method. Combining terrific human reporters, who are sort of domain experts or who can become domain experts, data reporters, who really have the freedom to be part of the investigative team, sort of go out and get these things. What is the beat for The Markup? Is that any data story? Or what's the specific focus?

Julia Angwin:

I would love if we could do any data story, but we're very small. And so we limit ourselves to what we call the impact of technology on society. Now, of course, today in today's world where our entire life is mediated by technology and we literally have no relationships that don't involve technology, that is a pretty big beat. We do have a lot of latitude to operate, but an example of a story that we did that I think is indicative of how we work is the one we wrote about how Google was preferencing itself in search results.

Anyone who's ever searched for anything on Google has noticed that at the top of the search results, there's all sorts of little info boxes and video carousels and all sorts of stuff that is, I think, meant to be helpful, but it also means that the sort of links out to the outer world are pushed way down the page. And many of those links inside of the boxes, in the info boxes are actually just to Google itself. And so we wanted to measure this. Our reporter who'd been covering this topic for a long time, even before she came to The Markup, had heard from people who were like, my business was destroyed because Google pushed me so far down the page.

Like all good stories that started with a tip, and then we're like, okay, well, let's see if we can actually measure, because we love to measure things at The Markup. And so we're like, can we measure how much of the page they're taking up? It took a long time and Leon had to develop a way of actually analyzing the page. He borrowed from biology and the idea of staining a cell to see which parts of it were different than others. We actually scraped all these search result pages and then stained them to identify which parts were the Google products in which parts were links to the external web. And we were able to say at the end of the day that Google was preferencing itself quite a bit. 40% of the first page was their own products and actually 60% of the first screen, if you were really looking, thinking about it from a mobile point of view.

And we were really proud that that finding was of course used and referenced in the House antitrust landmark report and that in many of the Department of Justice and other anti-trust cases against Google have cited this because you can know something, but not know it until the data is there to support what you thought you knew.

Ethan Zuckerman:

And of course my next question was going to be about impact, but you've already anticipated it. I know this is something that ProPublica is incredibly thoughtful about is sort of maintaining almost a portfolio of impacts that a story can have. It sounds like you're having some of the same thing in sort of analyzing technical systems and particularly the sort of technical systems that we interact with every day, whether it's search engines, whether it's social media. A lot of our conversation on this podcast is about sort of diagnosing and then sort of envisioning fixes for social media as it currently exists. What have you sort of been able to figure out applying these techniques to understand Facebook, YouTube, other social networks out there?

Julia Angwin:

I think that diagnosis is the right word. We see ourselves as diagnosing these problems so that they can be solved. We don't have the perfect policy solution. We don't propose policy solutions, but I think when you get really close to a topic and really precise about the diagnosis, then the solution actually can more naturally reveal itself. We wrote a story recently about Facebook and how they had said they were stopping recommending political groups. And then our data showed that they didn't stop. And we have a hypothesis we haven't been able to prove, but it looks in the data like they just stopped recommending groups that self-identified themselves with a political tag. And that's a really easy thing to fix. There's plenty of machine learning they could apply to this to turn off anyone who is political and not just rely on that user identified tag.

And in fact, since our story ran, they have turned off political group recommendations. And I think maybe they applied that. I will never know. I'm not inside Facebook so I don't know how that is, but we see that as impact too. It's not just legislation. Unfortunately, the truth is like the federal government doesn't even necessarily have as much power as these companies and so much of what the impact is, is the companies themselves choosing to do something.

Ethan Zuckerman:

Let's look at Facebook specifically, not because it's the center of the universe and I think you and I would probably both agree that there are some platforms like YouTube that tend not to be as carefully looked at, but Facebook has some unique problems when you go to study it. The truth is what I am seeing on Facebook is not public knowledge. It's been put together for me as part of my Facebook feed. You don't have an easy way of seeing what's on my Facebook feed. How is The Markup dealing with this question of studying something like the question of how much political content is being posted across Facebook?

Julia Angwin:

Yeah. It's a really hard question and one that I've thought about a lot. And I ultimately came to the realization that we were going to have to do something pretty crazy in order to answer this question. And what that crazy thing was is a project called Citizen Browser. Basically we have built a tool, kind of a custom web browser that we distribute to panelists who we pay to install it on their computer. And they log into Facebook through this app and then they never touch it again. But it allows us to once or twice a day, take a scan of what's at the top in their newsfeed. We then put it through this crazy pipeline where we redact every bit of personally identifiable information about them and their friends and any comments and messages, anything personal, photos. And what we're left with at the end is a data set that we hope really shows what Facebook is choosing to push to the top of the feed.

This is the choice that Facebook has made, that they're going to put X and Y at the top and they're not going to put other things as high. And the only way to do that because everyone's feed is individual, is to build this panel. And so we've taken this panel approach, which has been extremely expensive and we've had eight developers, 10 months, it's an enormous effort. It costs half a million dollars a year in a run. I'm currently raising money for that so if anyone's listening, please send us a big check. And so, but I do think it's the only way, there may be other ways, but it's one of the best ways I can think of to try to monitor this black box.

Ethan Zuckerman:

One of the analogies that I've sort of thought of for Citizen Browser is almost this notion of people creating their own network of air sensors. There are projects out there in places where we don't have very good readings of air quality, people buy a sensor, sort of hanging in their backyard and collectively we got a much finer grained set of detail. A little different here in the sense that you're actually recruiting people, you're paying people. How are you sort of recruiting people into this panel? Can anyone join? Or are you trying to get a specific subset of people?

Julia Angwin:

No, we don't have everyone joining. And the reason for that is I had done something similar to this when I was at ProPublica called the Facebook Political Ad Collector, which we just offered to readers and ask them to donate. It looked at their feed and extracted ads and sent them to us so we could see what political ads were being shown. But the ProPublica readership was very liberal and so we basically only got liberal ads. And so it was a really limited and useful, but limited window into Facebook. And so that's why when I just wanted to take another stab at this question, I realized we're going to have to pay people so we could get a representative sample. We don't have, by the way, as many Trump voters as we do Biden voters. And that's how we're making the distinction on political affiliation.

You could do it lots of ways, but that seemed like the most obvious way to do it in this year. And that's something that a lot of pollsters have struggled with is there's a lot of skepticism about institutions and the media in among Trump voters. And so it is hard to get them to participate. We still have a skewed panel, more Biden, but it means that we have a healthy representation of both sides. And so allows us to at least have a window into what this filter bubble looks like. How do things look on one side versus the other?

Ethan Zuckerman:

And what are you learning so far? You've got a sort of a unique perspective on this. Like I mentioned before, there really is no easy way to sort of look at a Republican Twitter feed or an AOC fan Facebook feed. Try that again. There really is no easy way to look at a Trump supporter's Facebook feed, because that is private information you're now getting indications of something that frankly, the rest of us really haven't had a good chance to see. What are you finding sort of in general terms around that?

Julia Angwin:

Well, we're finding that as Eli Pariser predicted, the feeds do look really different. The world looks really different when you look through the lens of a conservative versus a liberal user of Facebook. And we've been looking at all the different distinctions, that one seems like the most stark. There is difference between age like millennials versus boomers. There's small differences for women and men. There's differences urban, rural. But the biggest difference we've seen is on this political spectrum and the news that they see is entirely different.

Ethan Zuckerman:

And your sense is that this is how much of this do you think is Facebook's algorithms trying to give people what they're going to want? How much of this is homophily? People on the right, have a lot more friends who are on the right, people who are in rural areas probably have more info coming from rural areas. Do you have a sense yet for what of this just comes from who are friends and people's feeds versus this potentially being aggravated by Facebook a la the filter bubble hypothesis?

Julia Angwin:

Yeah, it's really hard to say because we deliberately strip out personally identifiable information about people and their friends. And so we aren't doing an analysis of what their network is and how it contributes because of the privacy concerns. It means that we are a little bit blind to that issue. And we want to be really clear about our limitations there, which is, we made this choice deliberately to decide that protecting the privacy of the panelists was more important than answering that question. But I think that even the fact that we have a limited answer, meaning all we know is what the outcome was. We know the outcome. The outcome is the feeds look really different. How much of it is Facebook's fault? How much is the user? That we can't distinguish but I think it allows that conversation to begin and Facebook can then step in and say, "Okay, well, we can respond by saying, okay, now that's not us." And then at least the conversation moves a little bit forward. And that's what I want to do is push the ball forward.

Ethan Zuckerman:

Yeah. I think that notion of these studies potentially forcing a response from the platform is pretty interesting. Can you talk about how that's worked in the past with Facebook? You've crossed swords with Facebook previously around the subject of ad targeting. How have they responded to that research? Sort of where has that led you in the past?

Julia Angwin:

Yeah, so couple of years back, oh my gosh. I think it was 2016. I wrote about how Facebook had this dropdown menu for advertisers. You go in, you want to buy an ad and it's like, do you want to target it to basically different ethnicities? And it would say, not only do you want to target these ethnicities, but do you want to block these ethnicities from seeing the ad? I thought, well, huh, could you actually do housing discrimination this way? Because it's against a lot to advertise your housing to just one race or deny some people of some race to see the ad. And we bought an ad that was discriminated. It said, "Don't show this housing ad to black people." And it went through and it approved. There was a big outcry, Facebook said they're going to fix it. They said they would build an algorithm to identify how these types of ads, they built the algorithm. And then I went in, tested it, bought the same ad, it still got approved.

And I said to them at the time of the algorithm, I was like, "You could build an algorithm to this. You could also just get rid of the dropdown menu that says block Black people from seeing that." I was like, I don't know, I'm not a super expert person, but that seems like an option. They're like, "No, no, no, we have to build this massive algorithm." Then what happened is that not I but then the fair housing groups continued to buy ads after that. Were able to get them through, HUD sued them. Finally, it wasn't until last summer that they were like, you know what? I think we're going to get rid of the drop down menu to block Black people from seeing dads. And it was like, that took five years. Really?

Ethan Zuckerman:

What you're saying is Facebook is very advertiser friendly.

Julia Angwin:

They also have a lot of faith in their algorithms. Really a lot of faith. And I think it's because those algorithms are not independently audited by anyone and so there's smoking their own joints.

Ethan Zuckerman:

Well, I think this is the really interesting question of all of this. Not only do we not have a national algorithmic auditing body, but as we've sort of established in this conversation, the only way to actually go ahead and audit these algorithms is either by buying ads on the services, which by the way, the services periodically complain is not a fair use of them. Or by starting to collect data through something like a panel. We're seeing other cases where Facebook has actually reacted pretty angrily to this. They've sent a cease and desist and put a demand date on it to the NYU Ad Observatory Project. Have you had any pushback from Facebook yet on the panel study?

Julia Angwin:

All of my fingers and toes are crossed while I'm answering this question, but no, we haven't heard anything from them. I wouldn't be surprised if we did. This panel is all people have chosen to volunteer their data to us. But I think that that is true for NYU Ad Observatory too. And the argument they made there is that that was not valid. Those people don't have the right to share that data. Yeah, they might make that argument to us. And I think that that's an argument that we should all have. Do we have any ownership over our own social media data? I think I would like to see the courts weigh in because I feel like the very least we should have partial ownership of it. I don't know. I'm not a property lawyer but it seems like to me.

Ethan Zuckerman:

There's all sorts of layers to this. Essentially you're using the service. Facebook is giving you information in exchange for the ability to surveil you at all times, you would think that being able to say, "Hey, third party, this is what Facebook is showing me," would be a right that you have. It's not a secret that you have entered into with them. Beyond that is the sort of larger question of, if Facebook and these other companies let's be clear, it isn't just Facebook that does this. If an advertising company is enabling racial based housing discrimination, someone needs to be able to audit that. Maybe where we end up on this is having some sort of national audit bodies that are looking at these questions like racial discrimination. What are the things you most want to know at this point? Julia, when you're sort of looking at Facebook or looking at YouTube or really looking at this broad question of technology's impact on society, what are some of the big questions that you're trying to ask?

Julia Angwin:

I think the questions I'm trying to ask are largely around the idea of what did they choose to amplify? They have this incredible power and they can take and lift somebody out of obscurity and make them go viral and they can suppress content. There was a Wall Street Journal story about how Facebook had chosen to suppress Mother Jones. They just decided they thought it was too partisan. And so those are the questions I'm concerned about because I view my job as accountability. And I think that accountability is for the platform and their choices. There are other questions to be asked, which is how do humans respond to what Facebook is doing to them? How do you get radicalized? And those are harder questions for me to answer. I would still love to know the answer, but I think for me, I view my lens through how can I hold the platforms accountable for the choices that they make? What content they choose to take down, what content they choose to leave up, what content they rise up in the feed, what content they suppress. And so those are the questions I'm always seeking to answer.

Ethan Zuckerman:

It feels like there's at least two sort of recent case studies around that, that seem worth looking into. One is the way that Facebook and other platforms have gotten more aggressive or at least claim to have gotten more aggressive about blocking certain types of mis and dis-information. I noticed just instead of watching the field, that when you started having quite aggressive anti-vax content, particularly around COVID, so the Plandemic video, we actually saw much more aggressive suppression of content than I'd seen previously. At the same time, our friend, Kevin Roose is pounding relentlessly on Facebook about the fact that Dan Bongino seems to show up as the most popular individual in the world, despite being pretty little known outside of far right wing circles. Do you have a way of sort of, you can study the outputs, you can't really study the inputs. Is there a good way using Citizen Browser that you think you can figure out figures who are sort of disproportionately popular or who Facebook's algorithm really seems to love or really seems to hate?

Julia Angwin:

We do see that to some extent. We have a fairly small panel. Facebook is 2.7 billion users and we have about 2,000 panelists. We see that there are some things that are more popular than others and we would like to believe that our panel is representative enough that that means something. Facebook's data is always going to be better. And so they know more definitively, but we are definitely seeing what Kevin Roose sees, which is there are some really weird just things that are disproportionately popular on Facebook. And it is also worth noting that what Kevin Roose is looking at is engagement. Meaning how many people liked and engaged with Dan Bongino's content, which is a slightly different thing than how many people saw it and we're looking mostly at how many people saw it, which is a very different metric. And it's one that Facebook says is actually more accurate. That's their claim. And I will say that our view of it is less extreme than what the engagement metrics show, but it's not not extreme.

Ethan Zuckerman:

Is the dream end state of this that you don't have to spend $500,000 a year to run a panel, that Facebook makes some sort of believable auditable version of this information accessible? Or even if Facebook starts doing a representative sample of users so that people can audit it. Is there still a role for an independent body like The Markup to come in there and try to sort of hold them accountable?

Julia Angwin:

It's definitely not the dream for me to be doing this. This is extremely expensive. And in any other industry, this would be done by a regulator. If I was an airline reporter, I would be FOIAing the federal aviation records on safety. And then I would hold them accountable when they messed up on their overview of the Boeing software. But that would be a once in a 100 year investigation. And I would mostly be like, there's those FAA guys and they do their inspections. Then I, as reporter, pull that data and write about the interesting things in it. The fact that I'm doing this just shows that there's no one else on the field doing it. And I feel like it needs to be done, but it's not the perfect at all state of affairs.

Ethan Zuckerman:

If take away one from this interview is please PayPal Julia half a million dollars a year, take away two is, hey, Biden administration, maybe it's time to start.

Julia Angwin:

Maybe someone else should do this.

Ethan Zuckerman:

Right. This seems like something that we really should be able to audit through the FCC and have data sets that are available in one fashion or another so that whether we're getting them through FOIA or God help us, maybe they're actually available.

Julia Angwin:

That's so shocking. Couldn't even imagine it.

Ethan Zuckerman:

For reporters. Indulge me in this sort of fantasy land, indulge me with what a well funded and maybe blessed with having a functioning regulator. What does The Markup want to look at? Not just Facebook, what are other issues within the social media space that would be subject to sort of Markup type investigation?

Julia Angwin:

Well, I have to say the thing that we, I think feel most passionately about is the use of technology in industries people don't think of as big tech. The fact that you really can't get hired without some weird hiring algorithm being run on your resume, which automatically disqualifies all sorts of people, as Amazon revealed. They were just rejecting all women, their algorithm was tuned wrong. That's the kind of thing, the everyday harms that are actually so pervasive and so understudied is I think what we are really passionate about at The Markup and we try to spend as much time as possible on, but those are really hard stories to get. Building those data sets is even harder than what we're doing with Citizen Browser, which is saying something because private companies don't have to share their hiring algorithms.

Julia Angwin:

Do we buy that software and test it ourselves? There's all sorts of crazy things that we consider. And so I think we will always have plenty to write about, even if this magical regulator arrives, because the truth is the technology is just a weapon and people can use it for good or for bad. And the truth is that a lot of people are using it to cover their ass. If they want to make a decision that's not popular, like keeping someone in jail longer or not hiring someone, it's so much easier to say the algorithm did it. And so this problem won't go away, even if there's an algorithmic auditing agency. And I think our skills will always be helpful in trying to show how these algorithms are. They're always going to be biased. And the question is to who's favored in those biases?

Ethan Zuckerman:

I love the fact that your response is, no, it's not necessarily social media questions. Those are charismatic megafauna these days. But I think in many ways, one of the most important books that's come out in the last couple of years was Virginia Eubanks' book on automating inequality, which is looking at these technology systems that are not super visible or sexy. It's not the YouTube recommendation algorithm, it's algorithms about who gets housing in Los Angeles and what's sort of built into that. I also think the example that you just gave with Amazon is this great reminder that it's possible to be dumb as well as evil or even dumb instead of evil. As I understand the Amazon situation, they trained a machine learning algorithm on the people that they had hired and it turned out that they rejected lots of women so the characteristics of women on resumes like playing on the women's volleyball team or studying women's studies, all became negative indicators of what it would mean to be a good Amazon employee. Machine learning may actually have done the right thing. They may have created in such a hostile workplace.

Julia Angwin:

It was perfectly working.

Ethan Zuckerman:

Yeah, yeah, no, it, it did exactly what they wanted. It guaranteed the highest indicator who would be a good fit for their corporate culture there. Give me just sort of a word or two of hope. You do the work that you do, it's successful, it's influential, people read it, people pay attention to it. Where are we in 10 years? What's changed from our current situation?

Julia Angwin:

Oh, there's a lot of ways it could go that aren't hopeful, but I'll focus on a hopeful scenario. I think that there is a possibility and this will see in Pollyanna, but there is a possibility that the fact that we have chosen to encode our biases, the way that Amazon encoded theirs in this machine learning system, makes them more inspectable and more fixable. And so Amazon was probably always a really sexist place and there maybe no one had any sort of understanding of how that came to be and they could tune the dial. You can actually, now that it's a dial, you can turn the dial and to be like, okay, let's be less sexist. Now is the world going to make those choices? Is Amazon going to make those choices? I don't know. But I have a small bit of hope in me that we could redress so many of the inequities in our society because by putting them into technology, we have given ourselves levers and we could move those levers.

Ethan Zuckerman:

I think that's a wonderful way to sort of think about this. I think a lot of us are scared of quantification in general and sort of if we somehow take irreducible human quantities and sort of bring them into data, are we somehow less human? One of the positives is that when we actually throw data at things, we can often see just how poorly we're doing. And even in sort of simple things like what percent of people hired by companies are female? Are people of color? So on and so forth. It gives us at least something that we can live up towards and hold people responsible for. The website is The Markup. You should be reading it. It's really remarkable. If you happen to be a funder who's looking for something to do, the Citizen Browser project in particular is a data set that really is giving us unprecedented access and understanding of what's going on in Facebook. Julia Angwin, it is always such a pleasure. Thank you so much for being with us.

Julia Angwin:

It's so great to talk to you as always. Thank you.