Did Facebook influence how people voted in the 2020 elections? This month, we’re focusing on a recent spate of studies published in Science and Nature studying how Facebook’s algorithms handle political content. First up is Laura Edelson, who was banned by Facebook for her work studying its ads through her project at NYU, the Ad Observatory.
Laura Edelson is an associate professor at Northeastern University and former leader of the NYU Ad Observatory. She recently concluded a year-long stint as Chief Technilgost in the US Department of Justice’s antitrust division.
This episode mentions her and Ethan’s article for Scientific American “It’s Time to Open the Black Box of Social Media” and her piece for Tech Policy Press “After the Meta 2020 US Elections Research Partnership, What’s Next for Social Media Research?”
Hey everybody, welcome back to Reimagining the Internet. My name’s Ethan Zuckerman. I’m here with a dear friend, someone whose work I admire immensely. Laura Edelson has just concluded a stint as chief technologist at the Department of Justice’s Antitrust Division.
Before that, a computer scientist, doctoral student postdoc, co-leading the Cybersecurity for Democracy project at NYU, led a project there on Facebook ads that gained a lot of attention when Facebook closed her account in an attempt to deter the research.
She’s written for the New York Times, she’s written for Scientific American, she’s written for all sorts of wonderful people on a wide variety of issues, particularly around questions of research. What can we and can’t we know? What can we and can’t we study?
She is starting a new career as an assistant professor at Northeastern University. Holy cow, they are lucky to have her. Laura Edelson, thank you so much for being with us.
It is a pleasure, Ethan. I’m glad we get to chat.
Well, there’s a ton of things that I want to talk with you about. And as I was talking to you before we started recording, eventually I want to talk a little bit about some of the recent research that’s come out about Facebook with Facebook’s permission—and this question of permissioned and unpermissioned research.
But I feel like I wouldn’t be doing my duty if I didn’t first thank you for your service in the last year as chief technologist for the DOJ’s antitrust division. But let me confess, I have no idea what that means.
What did you do as chief technologist of the DOJ’s antitrust division and what does that mean and what should we be watching and paying attention to?
That is a big question and I’ll do my best to answer it. So I think not a lot of people necessarily know exactly what the Department of Justice does on a day-to-day basis and don’t worry, you don’t need to. But I imagine that many of your listeners will have heard that the Department of Justice that the government is suing Google and is pursuing other cases around monopolization or other kind of anti-competitive effects in the tech sector.
And, you know, we’ve had laws on the books that prohibit companies from abusing monopoly position. We’ve had those laws on the books for a long time. And then the question is, how do we enforce those laws? Again, probably many of your listeners might remember the Microsoft antitrust case from a couple of decades ago, but you probably haven’t heard anything about monopoly suits for a while since then.
We’ve seen folks like Senator Warren sort of arguing for the need to regulate Amazon as a monopoly, but you were mentioning things like Google. Doesn’t Google like to make the argument that it’s not a monopoly because there are 20 other websites that you could conceivably spend time on?
Actually, I don’t think even Google argues that they effectively have a monopoly on search, they just argue that they don’t abuse it. And this is where it becomes a matter for a judge.
So to make a very long story short, after a couple of decades of pretty sleepy antitrust enforcement, the government is now seriously pursuing at least some tech monopolization cases and investigations. But because they haven’t engaged in too much of this work, they didn’t really have any data scientists. They didn’t have any engineers. They didn’t have anyone who could really help them conduct these investigations when you’re talking about enormous digital ad markets or enormous volumes of search data. How can you even answer these questions about whether a company has a monopoly or whether they’re abusing it, if you don’t have people who can manage that volume of data and go do an investigation?
And it’s not just a matter of the people who might do this investigation. You need all sorts of things in order to run those investigations, which they don’t have, or they didn’t until very recently. So a lot of what my role was as the first chief technologist of the antitrust division was just to sort of take a look at the kinds of investigations they need to be able to run and try to figure out, well, what do you need in order to do that? And how do we make a plan for going from where we are now to an organization that can effectively monitor the American economy and make sure that laws are being followed in the competitive and antitrust space?
I’m remembering more than 20 years ago when the Department of Justice went after Microsoft around integrating the Internet Explorer browser deep into the Windows 95 operating system. And one of the ways that the courts handled it was by appointing Larry Lessig as a special master to try to deal with some of the technical issues in all of this who promptly hired Jonathan Zittrain as the world’s most overqualified law clerk. And as Zittrain is fond of saying, boy, wasn’t that a fun week? They didn’t last very long. Microsoft objected to Lessig as an objective master there.
But it sounds in many ways like this is sort of a continuation. We are now getting into issues that would require data science, that would require assembling huge sets of data. You can’t talk about active cases that are ongoing. Can you talk a little bit about what sort of competencies the DOJ is going to need to build to be able to bring these cases going forward?
Yeah, I can absolutely talk about that because I think it’s probably pretty unsurprising that if we have a credible allegation of someone, a company abusing their monopoly, we need to be able to investigate that kind of claim. So you can imagine all sorts of spaces where there are at least credible questions about how competitive that space is. Ads, digital ads are certainly one of them, but I’m sure lots of people can think of others, things like social media, things like app stores. These are all areas where there’s really only a couple of active players. So if we were to imagine what those kinds of investigations might look like, obviously the skills in order to just move around and manage a really large volume of data that’s typically called data engineering, that’s a really absolutely vital skill just to be able to, again, move around a petabyte of data. That’s something that really takes thinking about.
And then the ability to sit with an economist or to sit with a lawyer and listen to the investigative questions that they might have and figure out, OK, what kind of data do I need to go request in a subpoena in order to answer this set of questions? That’s really a job for a data scientist. And then to take that all the way through to an investigation, you’re again talking about a set of data science skills where a data scientist might work directly over a period of weeks or months with an investigator to iterate through a series of questions and use the data to try to tease out those answers.
A really common question that comes up in all sorts of ways is just how big is this market? And what kind of market share does company X have? Really common question comes up in all kinds of investigations. And that can often be just a matter of like, well, how do we define this market? Do we define it by who has the most customers? Do we define it by who has the most transactions, the highest dollar value? And all of these different definitions of the problem, you would need to manipulate the data in different ways in order to get to that. And you mentioned something else that is actually really important when people forget. When you’re going into any of these tech cases for antitrust, you don’t just need to prove what you’re alleging about competitive harms, about maybe some kind of illegal action to secure a monopoly.
You also need to be able to articulate a remedy. You need to be able to say, OK, Microsoft has—they have illegally tied their browser into the operating system. You also need to be able to demonstrate that if the remedy you’re asking for is, well, make them severable, that it’s possible to do that. And doing that would actually achieve the outcome you want. So there are these other sets of skills around things like understanding how software works, understanding how to audit code or audit an algorithm that would help, again, investigators in a court, understand that a remedy you’re seeking would actually fix the problem that you’re alleging.
What interests me in some ways is that we’ve seen the remarkable Lina Khan start getting a lot of attention for some of this work from the perspective of the Federal Trade Commission. I think at the same time, discovering that it can be actually very challenging to win these cases in one fashion or another. You’re coming at this really from the perspective of saying even before we win cases in DOJ, we need the data to sort of put these things together.
You come by your data skills very honestly in putting together a remarkable project at NYU, the NYU Ad Observatory. Can we talk a little bit about how you started working on that project, how it came about, and then I am sure we will get to the abrupt end of that project.
Yeah, let’s talk about that. I should probably mention by way of background that I had been a software engineer for many years. I had a career in the tech industry working primarily as a back-end engineer and as a data engineer, managing really large volumes of data and figuring out how to do things with it. So I did have a background of doing this primarily in the financial sector.
So I came into a PhD with a background in big data. And when I got there in the fall of 2016, I think like a lot of people, I looked at the, everything we learned about interference in the 2016 election, and the way that Facebook ads in particular, but also just social media, seemed to be really taking a dark turn.
And the thing that became very clear to me was that this was going to be an existential problem for our democracy, that this was like the existential safety threat of the modern internet. And that we just needed to work on it.
We needed to work on it in the way that after cars started to become ubiquitous, a lot of people were dying on the roads. Figuring out how to make cars safer was an existential problem for people wanted to drive around in cars.
And that there’s an enormous distinction between banning cars and making cars safer.
Yeah. I think like a lot of people, I love the internet, you know, I love social media. I think it does a lot of good, but I also think it has a safety problem. And what that means is that, you know, we throw this thing away. It’s that we figure out how to make it safer. And so that was a lot of the goal of Ad Observatory because the first step to making a system safer is really understanding why and how it’s going wrong. So the entire premise of the Ad Observatory was, okay, we know that there is this problem in political ads on Facebook, and what we are going to do is really try to embrace functional transparency. We had to do a tremendous amount of data collection for our own research.
And if we were going to do all of this data collection, all of this tagging and labeling and creating metadata like topics, like understanding what ads’ purpose was, building all of these machine learning models in order to do all these classifications, well then the least we could do is also surface that data to other people like journalists and to the public. And so that’s really what we were trying to do.
And the research questions behind this were really questions about how political advertising works on Facebook, how advertising more broadly works on Facebook. What sort of the universe of questions that you found yourself pursuing through the ad observatory project? There were some resources from Facebook to allow scholars to study ads. Were they not up to snuff?
I really want to give Facebook some credit here because while I ultimately decided that the data that we were getting from Facebook wasn’t enough and that we needed to supplement it in other ways, Facebook genuinely deserves credit for giving the public and researchers a lot more data than other platforms do. They make a lot more data available than YouTube ever has. They make more data available than TikTok. Facebook is really one of the better platforms when it comes to transparency. That said, very clearly there was and remains a problem in Facebook political ads. And the data that they were making available was just not sufficient. In particular, as of, you know, 2018, 2020, 2021, they weren’t making data about the way ads were being targeted transparent.
So you could see some ads, you could see the ads that had been declared to Facebook as political, but you couldn’t see how advertisers were targeting those ads. And that’s actually a really crucial piece of the puzzle. if you’re trying to identify all kinds of harmful ads.
So you were asking me, oh, did you just focus on political ads? Our focus at the Ad Observatory is actually on all kinds of harmful content, which we broadly define as mis- and disinformation, but also fraud and the garden variety scams. And the reason that we focus on both of those things is they’re actually pretty similar when you actually think about them separate from the content. But the way they’re targeted is very, very similar. The way they spread through networks is very similar.
We’re coming at this problem of harmful content and social media from the perspective of computer scientists, I think about it like a network problem. So the politics part is just where a lot of misinformation turns up, but it’s not exclusively a political problem. It comes up a lot in health information too, but it comes up in all sorts of weird places where someone can find a financial motive usually.
It’s so interesting because I feel like in some ways, you know, going back to maybe 2016, political misinformation felt like a taboo that had to get very consciously crossed, right? America has a long and proud tradition of fraud. I mean, just people out and out, trying to sell you the Brooklyn Bridge. To be perfectly honest, like much of that fraud comes around panic medicine. So the idea of health misinformation, literally we have the expression of snake oil salesman because people have been trying to sell us since the dawn of contemporary capitalism, cure-alls.
But there was, up until the 2016 election cycle, at least a little bit of shame about missing this information around politics. I suspect we’ve lost that now, and I suspect we’re heading into 2024, where all the gloves are off. But, okay, so you were looking very broadly at ads, Facebook was making some information available, but really just on political ads and not necessarily on the targeting of ads. You ended up doing something that I ended up writing about is data donation. Did you end up using a similar term for what you were doing?
Yeah, I think we usually call it either data donation or crowdsourcing. I really like the data donation term because just on a philosophical level, I believe that people own their own data. And if they own their own data, then they should have a right to donate it. And we should think about it that way.
And so mechanically, the way this works is an informed user, opts into putting a plugin into their web browser, that says, hey, this specific piece of information, i.e. what ads are being delivered to me on Facebook, I’m going to donate that data to Laura and Damon and their team at NYU. And I’m going to share some information about who I am so that we might be able to backfill and figure out how the targeting ends up working. I know the privacy risks associated with this. I’ve chosen to install the software. I’m going to hand this data over to a team of accomplished and accredited researchers who have passed this through their universities IRB. What possible problem could Facebook have with this?
Oh boy, you know, I’m honestly still not entirely clear on that. In short, yes. So we built a browser extension that allowed people to opt into sharing ads that they were served on Facebook with us. And Facebook objected pretty heartily. They said that this was a privacy violation, although for the life of me, I don’t see how. Just to be very, very clear, it was very easy to verify exactly how our browser extension worked. Completely aside from the fact that you can download the browser extension and see exactly what it’s sending back, all the code was open source, all our packages were signed. The browser extension was reviewed by Mozilla for any kind of privacy violations. Obviously, it was reviewed by our institutional review board as well.
So the fact that we were being extremely careful with our users’ privacy, that we weren’t only taking ads and nothing else and that everyone had opted in, you know, this was all very enthusiastic consent, was very, very easy to verify. But Facebook still objected.
If I were speculating, which is all I can do, you know, I imagine that they saw this as a competitive threat that we were because we weren’t just taking the data and keeping it. We were also making this data publicly available to other researchers. We thought that was important because there was so little of this ad targeting data available.
We needed it for our computer science research that again comes at this problem of mis- and disinformation from a computer science network kind of perspective, but there are many other researchers who are interested in the same content but are doing very different science. There are social scientists, political scientists, journalists, and and civil society advocates who want to work with this data, who are going to do something important with it. And if I can take in this data, do my science and then also make it available for other people. I have to do that. It’s a moral imperative. It’s also the way science should function, where we don’t just share our results, we share how we got there. And so for all those reasons, we always made that data publicly available. A lot of other people used it into interesting things with it.
So you can imagine a Facebook executive looking at this and saying, “Oh my God, what if Twitter looks at this data set and figures out that Joe’s Discount Coffee Company is targeting ads to women 35 to 55. And now, you know, Twitter’s going to go offer them cheaper ads. Whereas in the academic world, the way that you’re doing this, which is to say the review board, the openness of the data, the openness of the code, like, this is how we roll. Like, if you don’t do things that way in academe, people are going to look at you with a good deal with suspicion and real questions about—and I should say like computer science as it turns out along with physics is one of the fields in which we are most likely to hand over our data unlike say biology which actually keeps its cards a lot closer to the chest, you know, patents more compound so on and so forth.
Facebook effectively shut down your study. What actions did they actually take? I know that they canceled your personal Facebook account, which seems spiteful in the extreme. What actually was that sort of forced you to stop doing work on this?
So I should say that we have resumed work. So it’s not that we are completely shut down, But we were shut down for a good while. And the thing you have to remember is that the way that everyone who does research on Facebook, the way they access all of these tools, like the ad library, API, like Crowdtangle, the way you access them is with your personal Facebook account. So you don’t have, oh, here’s my personal Facebook account and my business Facebook account. You have Facebook account.
And so when my personal Facebook account and Damon’s personal Facebook account and our lead engineer, his personal Facebook account, when those accounts were terminated, that cut off our data access. And we couldn’t just ask someone else at our lab, oh, hey, can you give us access to data through you? Because the terms of service that you have to agree to when you get access to any of these data pipes is that you won’t share it with other people who don’t have access to it. And we didn’t want to endanger any other researchers’ work.
So yeah, we were shut down for maybe nine months. It was quite unfortunate timing because we were just about to launch an ad observatory for the German election that was happening at the time. And they shut us off like a week before we were about to launch, but that thems the brakes.
So let me pivot. A whole set of studies has come out about the 2020 election. It’s now 2023, but as we both know, it takes a long time to get social science done. Some of the really the best people in our field are authoring these studies. We’re going to have on in the next couple of weeks, Talia Stroud, Brendan Nyhan, who both done studies in this set to talk about what they found and what they haven’t found in the 2023 election. You wrote a piece for tech policy press which both did an excellent job summarizing some of what we’ve learned about these studies and also raising some real concerns and cautions about the research methods included. Before we talk about the methods, can you give me a quick sort of overview someone who’s really closely read these studies, what did we learn?
I really do think that there was a very interesting setup of these experiments and the findings had some really interesting leads in them for future research. So these studies came out a few months ago and you really needed to read them closely to see how they were set up. So can we back up for a few minutes and just again talk through the experimental setup because it was actually quite good?
Please go ahead.
So what they did is they, the entire premise was, you know, we’re going to have control groups and treatment groups where one group just sort of gets what they always get. And then for this other group, we’re going to change some feature.
So a lot of the focus in this set of experiments was on what do people’s feeds, how does it impact people’s views, their behavior, what they see, like the content of their news feed. And they varied different aspects of what goes into making your feed your feed. You can change a few things. You can change the way content is ranked. So the way that they tried to get at that ranking aspect of content was that they had one group that was, again, the control group. And then for the other group, they gave that group just a reverse chronological feed, which is as close as we’re really going to get to having effectively no ranking at all. So that helps us understand, well, you know, what impact does ranking have?
They had another experiment where they had users, they changed the impact of different content features. So there was concern that, well, what impact does liking something have versus resharing it, for example. And I really think that this view of, how important are these two different types of user engagement or user sharing to the algorithm. This is really, I don’t know, this really feels like a very Twitter focused type of intervention. But we all know that so much of the academic literature around social media is, was built on Twitter. So that’s probably why. But, you know, so that intervention was really, hey, if we change the weighting of certain features, how does that impact a user’s feed and then maybe that user?
And then the third type of intervention after looking at the impact of the ranking algorithm and looking at the impact of one algorithm feature was looking at, well, what if we change the sources that we’re taking these features from? So the way you get content in your Facebook feed is from your friends, from your social network. So if we change the weight of input from your like-minded friends, what does that do? So by looking at the impact that ranking itself feature weights and sources of features have on a user’s feed and a user’s behavior and a user’s opinions, That’s actually a really robust way at getting at these questions.
And they actually did find pretty interesting results while they didn’t find a change in political affect on polarization. They actually did find both really important and statistically significant differences in what was in a user’s feed. Something that I found particularly interesting was they also found changes in user on platform behavior. And I think that’s something that I found the most promising.
I found this super interesting because obviously a lot of my research has to do with giving users control over the algorithms that they use to sort through their social media. Everybody asks for a reverse chronological feed. Facebook ended up observing two things happening with the reverse chronological feed. One was that they actually ended up feeding more likely mis- and disinformation to their users because there are people in your feed who are posting mis- and disinformation. I certainly know there are some in mine. Facebook’s feed at this point is actually pretty sensitive to it tends to filter it out. It shows up again if you go into reverse chron.
The other is that weighing the feed, looking for the stuff that Facebook believes will keep you interacting, will keep you clicking, is really effective. The reverse chronological feed had people walking away from Facebook much, much earlier, which may be why Facebook makes it so hard to access.
But Laura, the question I really want to ask you about this is this work in many ways is the inverse of how you’ve worked. These are researchers who got cooperation from Facebook to run huge scale experiments and, you know, run them on hundreds of thousands of Facebook users to get this enormous data set. Should we be happy about this or should we be worried about this, this approach to research?
I think we should always be happy about more research. I also think that it is not a substitute for independent research.
I really think of this as a product safety problem. We know that there is a problem with recommendation algorithms. And it is important for car companies to do research into how safe their car is. It is important for pharmaceutical companies when they’re making a new drug to run safety tests of that drug. It is also important that independent research is done that verifies that products that we use are safe. And to have the independence to ask questions that companies can’t ask of themselves, especially for-profit companies can’t ask of themselves.
And I think there’s another aspect to this. At least some of the risks that I think a lot of people are worried about are cross-platform risks. There is a real risk right now when you see an ad or a DM on Facebook and has a Telegram link in it, you should be very worried about that ad. It has a very high chance of being some from a fraud or something even worse. That is a cross-platform risk.
It is very, very difficult for companies to study those kinds of risks themselves. They’re really only studyable by independent researchers. So I’m super glad that platforms are collaborating with researchers and doing work directly with them. This cannot be the only way that we study what happens on platforms.
You and I have co-authored pieces for Scientific American making the case that we need unpermissioned research, independent research, as well as permissioned research. And I want to add that I don’t think this research being permissioned undercuts its validity. It does constrain what it can be about. It can constrain what datasets they’re able to study. Sometimes what questions they’re able to study. What are steps we should be taking to defend independent un-permission research in this space?
We need to pass the Platform Accountability and Transparency Act as a first step. Right now, we have been seeing a real assault on the ability for independent, unpermissioned research to take place on a variety of platforms. I’m sure your listeners have heard about the pullbacks on the Twitter and Reddit APIs. Those have been a really devastating blow to research, But it’s not just that. Platforms are increasingly deploying legal threats against anyone who would scrape—I actually don’t really like that term. Platforms are increasingly deploying legal threats against researchers who crawl their websites. And that is a commonly used method for studying the internet and has been, since the very earliest days of the internet. And yet this was actually part of Twitter’s legal complaints against the Center for Countering Digital Hate.
So I think as a first step, let’s pass transparency legislation and then let’s actually fund and support internet research that is independent because there are going to be some legal balance up ahead. and unfortunately, I think we’re just going to have to have them.
She’s Laura Edelson. She’s one of the smartest people researching platforms from a computer science point of view. She’s starting what I predict will be a brilliant new career. As an assistant professor at Northeastern. Laura, thank you so much for making the time and helping us understand, you know, what we should be asking about these platforms how to ask it.
Thanks for having me, Ethan. It’s always good to catch up.