15 Platform Oversight Needs Accessible Data with Elizabeth Hansen-Shapiro

Reimagining the Internet

00:00 / 32:12

Elizabeth Hansen-Shapiro joins Ethan to talk about “New Approaches to Platform Data Research,” the report they just published together with the NetGain Partnership. Elizabeth and Ethan talk about a variety of issues facing journalists and researchers for studying social media companies, and what sort of solutions — both small-scale and radical — could help ensure a better-studied, more accountable social media ecosystem. Elizabeth is the co-founder of the National Trust for Local News.

Transcript

Ethan Zuckerman:

Welcome back to Reimagining the Internet. I’m your host, Ethan Zuckerman. I’m here with Dr. Elizabeth Hansen-Shapiro. I’m thrilled for both her and her husband, Jake, that there’s now the Shapiro on the end of that. Dr. Hansen-Shapiro is the co-founder of the National Trust for News.

She’s a Senior Research Fellow at the Tow Center for Digital Journalism at Columbia University. She formerly led the News Sustainability and Business Models Project at the Shorenstein Center on Media, Politics and Public Policy at the Harvard Kennedy School. She is just an absolute top-notch researcher on media and its implications for democratic society, and she is my collaborator on a new report that we are releasing later this month, that looks at research of the big digital public platforms. Elizabeth, I’m so glad to have you here.

Elizabeth Hansen-Shapiro:

Ethan, it’s so great to be here. Thank you for having me.

Ethan Zuckerman:

This is a weird conversation, because I am going to interview you about a report that you and I are co-authoring, so I don’t want to hide the ball here and pretend that I’m asking questions about a topic where I know nothing. On the other hand, it’s going to be more fun if I can get you to talk about this, and it’s worth noting that you and I have had disagreements throughout the process of doing this, and this is the great thing about working with people smarter than you are, is that they can talk you out of your own ideas on this. What’s this report? Who’s it coming out for? When is it coming out? Why are we doing it?

Elizabeth Hansen-Shapiro:

Yeah. Sure. This report is for the NetGain partnership, which is a group of foundations that have been supporting the work of platform researchers, and they really wanted us to kind of dig into the state of play that these amazing researchers, and activists, and journalists, I should say, are facing as they’re trying to gather data on and about platforms to answer some really important questions about, “What does the spread of misinformation look like? How are these platforms and the interactions that are happening on top of them affecting our society and democracy?” It’s been kind of a rolling drama, I would say since 2018 in Cambridge Analytica in terms of what’s been available and not available, so this was really our attempt to help them see the landscape clearly, to try to understand both what problems are out there that researchers, and journalists, and advocates are pursuing, and what the challenges and opportunities are around the data that they have access to from social media platforms.

Ethan Zuckerman:

For the purposes of this report that we’re working on, what’s a platform? What do we consider a platform, and what do researchers want to know about them?

Elizabeth Hansen-Shapiro:

Yeah. I think, Ethan, you and I went back and forth on this question of, “What is a platform and how do we define it for the purposes of this inquiry?” Where we landed in defining platform is that these are essentially media platforms where users can post their ideas in various forms, so that could be text, or that could be audio, or that could be video, and these are usually free services, so they’re monetized by advertising on the platform side, so the kind of like platform, traditional platform function of a social media platform is to really bring together those advertisers, marketers, the users who are creating content.

Ethan Zuckerman:

This includes Facebook and includes Twitter, it includes YouTube. How broad, how complicated is this universe that researchers are trying to study?

Elizabeth Hansen-Shapiro:

I’d say it’s broad and complicated and getting more broad and complicated every day, so on the one hand, we have Facebook, which has billions of users across the world, so we kind of like think of them as the sort of 800,000-pound gorilla, but then, there’s all kinds of other platforms that are kind of ancillary to them, like Twitter or YouTube is also another kind of 800,000-pound gorilla, but then, we have smaller platforms like Gab or what used to be Parler, that are kind of growing up to meet little user niche needs with similar services.

Ethan Zuckerman:

This, of course, has been one of the things that’s been so challenging about writing this. We turned in a draft of this before the Capitol riot, and we talked about the importance of being able to study platforms like Gab and Parler, and then suddenly, we had this very clear example of why it was so important to study them, so this is really a study of how researchers are trying to get information from these different platforms, what are the different strategies that they’re using. First of all, what do academic, and activist, and journalistic researchers want to know? What are the sorts of questions people are asking of Facebook, of YouTube, of Parler, of all these different spaces?

Elizabeth Hansen-Shapiro:

Yeah. Yeah. I think there’s a set of questions that are around the kind of content itself, so like, “What is the nature of the content that’s getting shared?” There’s a set of questions around, “How is it getting shared?,” so like, “What is the spread? What does the spread look like? What does it look like when a particular piece of content goes viral or moves through a particular network?”

The real challenge, particularly on a platform like Facebook is that when you start to answer those content questions, you immediately bump up against this problem that the data that’s available around reach is different than the data that’s available around engagement. Ethan, I know you’ve thought a lot about this problem, so maybe you can give your version of the barriers to answering those kinds of questions.

Ethan Zuckerman:

Sure. I mean, one way to think about this is that a platform like Twitter is mostly a public platform, right? Generally, when you’re tweeting, the vast majority of tweets are not protected, they’re going out to the entire world, and you can look and maybe make some estimates about reach, how many people are seeing them. You can certainly see information like how many people liked them, how many people retweeted them and shared them. Facebook is a different critter, and most Facebook feeds are private.

They’re shared with a small number of people, and so even asking a question like, “How often did this URL appear on Facebook?,” can be somewhat controversial. You’re basically asking for an aggregate of people’s private sharing. Facebook reveals some of this information. There are tools like CrowdTangle that make it possible to say this post by Dan Bongino got lots of engagement. Many people liked it, many people shared it.

What we don’t know is how many people have seen a given post, the actual reach of these things, and so far, Facebook, for the most part, isn’t giving us that data. We just interviewed Julia Angwin of The Markup and her Citizen Browser Project, which is one of the things that we write about in this report, and that’s really the first effort that’s making a guess at what the reach is of some of these things, but, Elizabeth, one of the things I found so fascinating about writing this with you … I’m a researcher. I generally just want to grab the platforms and kind of shake all the data out of them. You made a very convincing case that the platforms often have a really good reason why it’s hard to share this data. Make that case for me.

Elizabeth Hansen-Shapiro:

Yeah. Sure. I think what I heard from our interviews about why it’s hard for platforms to share this data is that at the highest level, there’s a whole thicket of privacy regulations, both domestically and especially globally, coming out of the European Union, that actually make it legally risky for platforms to share this data. Those are not concerns that we should take lightly. The Cambridge Analytica scandal was a scandal because there was personally identifiable information that users did not consent to be released.

That was released to researchers and beyond, so those privacy considerations and the constraints on platforms that are created by those privacy regulations are real. Well, then I think a lot of the kind of risk management and internal decision-making around data access turns on those questions of privacy regulations and the risk that comes from making potentially identifiable information “Public,” even if that “Public” is a group of academic researchers.

Ethan Zuckerman:

One of the things that happened with this report is it’s happening in the wake of a research project called Social Science One. Social Science One was this very ambitious research effort, organized by a pair of just top social scientists to work with Facebook, to open up a data set, in particular, to study political influence and elections, instead of Facebook’s role within it. Despite the fact that the academics who started the project had great contacts within Facebook, and despite the fact that the people the data was opened up to were a set of accomplished academics, had been very carefully screened, Facebook had a really hard time opening up the data, and many of the people who were involved with the project ended up just incredibly frustrated. One of the interesting things though, is that there’s lots of what you might consider to be informal data collection. Tell us a little bit about some of the people like Pushshift, who are doing this work informally.

Elizabeth Hansen-Shapiro:

Yeah. I think part of the interesting finding for me and kind of doing this exercise of mapping the landscape on the methodology side is that people are really attacking this problem of platform data access in a variety of ways. Social Science One was the sort of official attempt to create a front door to some of this data for Facebook, but Pushshift and others are engaging in their own scraping of platform data, which is what we categorize in the paper as unsanctioned, so they are absolutely taking on some legal risk in doing this, but they are doing it and they are engaged in a kind of interesting and ongoing cat and mouse game, as some of these platforms attempt to shut off their access, and detect their scraping efforts, but that’s really, that’s a side door that some of these intrepid researchers are creating to scrape publicly available data, or data that you can get from a handful of kind of logged in users, and to collate that across platforms, so it’s not just that there’s a single platform under [city 00:12:06], but the really cool thing about Pushshift is that they’re looking across platforms.

Ethan Zuckerman:

Yeah, Pushshift is remarkable. It’s a guy named Jason Baumgartner, who started scraping Reddit years ago, and it’s created sort of the only major scholarly repository of Reddit, and he’s gone on to do work on Telegram, he’s been doing collection work on Gab. We saw-

Elizabeth Hansen-Shapiro:

Twitter as well.

Ethan Zuckerman:

What’s that?

Elizabeth Hansen-Shapiro:

And Twitter as well.

Ethan Zuckerman:

Yeah. Yeah. No, he’s … Although, of course, that’s another change that’s happened in all of this. During the time that we wrote this, Twitter has now significantly opened its research API.

It’s not as clear to me whether Jason will continue doing the Twitter work now that this is so much more accessible. This is the moment where I rip off my sweater and show that I’m wearing The Markup scraping is not a crime T-shirt. Of course, the challenge is some of these efforts actually are coming under legal scrutiny. Can you talk a little bit about the NYU Ad Observatory?

Elizabeth Hansen-Shapiro:

Yeah. The NYU Ad Observatory created a browser plug-in that would allow web social media users like you and me to install this in our browser, and then they would collect data using that plug-in from our Facebook feeds. Now, sometime in, I want to say it was November, Facebook sent a letter to them saying, “This violates our terms of service. Please cease and desist.” This caused a kerfuffle amongst the researchers that we’ve been studying and working with because this kind of methodology is really one of those ways to get around some of these data access issues and to actually go right at the problem of user consent, so if you are a user and you are consenting to plugging this tool into your browser, you are essentially saying, “Yes, researcher, it is okay for you to collect this data,” and of course, for us, as researchers, who’ve been trained to respect user consent and to work with our IRB, is that’s a hugely important component to doing high quality ethical research. It was a really interesting example of how, even if you can net out the problems of user consent, there may still be some residual platform resistance to those kinds of data collection methods.

Ethan Zuckerman:

Our early indications on the NYU Ad Observatory is that Facebook is making the argument that the users can’t consent to do that, which is interesting, right? On the one hand, it’s quite possible that Facebook’s terms of service is interpretable in a way that says you can’t give anyone unauthorized access to this. At the same time, it seems like what I see on Facebook is something that I should have control over, whether I share it with someone else. That isn’t a trade secret, what Facebook is showing to me versus someone else. There are things that researchers want to study that just aren’t likely ever to be released. Can you talk about people who are trying to study moderation decisions?

Elizabeth Hansen-Shapiro:

Yeah. I think, Ethan, you should weigh in here because I think some of those interviews were also yours, but part of what I heard in the, amongst the researchers who are looking at moderation decisions is that it can be very difficult. It can be very difficult to study takedowns kind of after they happen, and so getting some kind of deep retrospective like long-term longitudinal view of the outcome of those decisions can actually be incredibly difficult. It’s also difficult to study at a policy level like to what extent the stated policies are actually being followed through and what the results of those are, and kind of what the distribution of those content moderation decisions actually is. I’m curious what you heard from-

Ethan Zuckerman:

Yeah, I … One of the things I would say is a lot of the research that people are doing is descriptive, right? It’s basically, who’s tweeting about what, and who’s sharing what content, and what URLs are most prominent? When you start studying things like moderation decisions, you’re often actually moving into the realm of an audit, and you’re trying to ask the question of, “Are a company’s processes fair or not?” For instance, Palestinians have long complained that their content gets taken down much more often than Israeli content does.

That’s something that should be auditable, right? It should be possible to look in there and sort of say Israeli posts that are pro-Israel get taken down at this rate, pro-Palestine get taken down at this rate. Black Lives Matter protesters have complained about this. There’s been a big dialogue within some black rights communities suggesting that criticizing whiteness is a way of getting taken down. It would be really nice to be able to sort of come in and evaluate those things. Those tend to be information that platforms hold very tight to.

Handing out information, instead of saying, “Here is content that came in, and we decided to take it down,” you can imagine the ways in which this could be very dangerous for a platform, right? This is content that they took down in part because they didn’t want to get sued, or they were afraid of liability around that. In the long run, Elizabeth, how do we study these things? Is this something that independent researchers and journalists can study? Does the government have to get involved? Are there audit strategies that sort of get us there?

Elizabeth Hansen-Shapiro:

Do you mean particularly on the moderation front or in general?

Ethan Zuckerman:

I would say in general, right? If one camp says, “Let’s just scrape the heck out of this. Let’s go to the courts. We’re going to scrape as much as we can. We’re going to study it.”

We have sort of a Julia Angwin camp that says, “Put plug-ins in the browsers. Let’s see it from the user point of view.” That’s almost certainly going to get litigated. There’s a camp that says if the platforms were just good to citizens and gave us APIs, like Twitter recently did, we’d solve the problem, but it still seems like there’s cases like moderation where we need something much closer to an audit system.

Elizabeth Hansen-Shapiro:

Yeah. Ethan, I think we together did a really good job coming at this with different perspectives and sometimes different opinions to assemble a range of options because, I think in this space, as in many others, there is no silver bullet, and so it really does have to be a combination of like cooperation, cooptation, and opposition, I think data collection strategies to get us there. Under the cooperation front, I think as flawed as Social Science One was, and as frustrating as it was for its participants, it’s an important kind of first step to figuring out what a cooperative model looks like and what the constraints and interests are in both sides, and what happens when lots of talk about cooperation actually meets like governance and workflow decisions, and so my sense is, from talking with the researchers in this study, that will continue and that’s a productive and important avenue to keep open, but that is not to the exclusion, I think of these oppositional efforts, for sure. It’s also not to the exclusion, I think of regulatory approaches, some of which we can talk about. It’s not to the exclusion, I think of the call for new civil society institutions that can do this because there are definitely whole sections of this problem that are not going to be solved by regulation, are not going to be solved by cooperation, are not going to be solved by individuals banding together in oppositional data collection, but can actually be solved by audit bodies, for example. [crosstalk 00:20:59]

Ethan Zuckerman:

Let’s dig into those two things. Talk to me first about sort of regulatory solutions. What are some of the frameworks that people are thinking about to make data more accessible or at least more accessible to someone?

Elizabeth Hansen-Shapiro:

The first area where I think there are some promise, and in some ways, this is kind of starting small, but, Ethan, you and I have had some discussions with the Knight First Amendment Initiative around supporting a researcher safe harbor, particularly for the CFAA, so a kind of targeted kind of scalpel approach that goes directly to this argument about violations of terms of service that platforms can engage in, in order to stop the kinds of data collection like the NYU Ad Observatory. A safe harbor exception in the CFAA, I think would really help.

Ethan Zuckerman:

Right. The Computer Fraud and Abuse Act has been around since 1986. It’s often used or abused as a cudgel against researchers essentially saying, “We’re not going to distinguish between a research use of a platform and hacking that platform,” and having a researcher safe harbor under CFAA and similar laws is one possible step forward.

Elizabeth Hansen-Shapiro:

Now, I think that the asterix on that, which you and I have talked about over the course of writing this paper is that then, the question is, “Well, who is a researcher and who gets to say who is a researcher, and if we call our researchers, in particular, are we then also excluding folks like Julia at The Markup and other advocacy groups that might want to engage in similar-“

Ethan Zuckerman:

Or even the pseudonymous folks who scraped Parler and made it accessible as a tool for people to understand the January 6 riot at the Capitol.

Elizabeth Hansen-Shapiro:

Exactly. A researcher safe harbor may or may not help those folks, and that’s really important work, so we want to be careful of that. I think there’s, as you and I have periodically discussed, there’s a whole thicket, and I mean thicket in every sense of that word, of regulatory policy coming out of the European Union that has tried to get at this, a version of this researcher exception, and from what we heard, is kind of at some level, creating more problems than it solves, but the advantage to our European friends, like cutting that Gordian Knot and really figuring out at the EU level what some research exemptions would be would actually help the rest of us because so much of that policy is kind of like the master policy for research work here as well, just because of how platforms are tuning their policy decisions. That’s another kind of like in the realm of the possible but difficult. The other, I think, interesting and important long-term kind of tectonic policy shift that I think would be very interesting is a complete reclassification here domestically of social media platforms as common carriers.

Elizabeth Hansen-Shapiro:

We had a couple of very interesting off-the-record conversations informing this report with folks in D.C. who really feel like a reclassification of social platforms under common carrier legislation would open up a range of rule-making possibilities that could really help us make progress on this researcher access and data access.

Ethan Zuckerman:

Part of what might become possible in a common carrier situation is that platforms might become significantly more auditable, and there might be a federal level regulatory response that said the platforms were auditable in certain ways. One of the wackiest ideas in this report that we ended up sort of putting out there is this idea that you might almost imagine algorithmic auditors who are not unlike the ways in which publicly traded businesses are fiscally audited to make sure that they’re following best accounting practices. Of course, part of what happens in this is we don’t know what a best algorithmic practice is. Our folks in the fair community, who are sort of trying to figure out questions of equity and fairness in AI, that’s going to be a really, really open topic if we sort of get to that place. Elizabeth, let me say, for me, maybe the biggest surprise in all of this work was this idea of a tension between privacy advocates and researchers.

No one, I think wanted to say, “Damn, you privacy advocates,” but it was often a situation where researchers who really want to understand what was happening on these platforms found themselves intentioned with privacy advocates, and of course, many of these players see themselves as allies. They’re funded by some of the same groups. This was a really interesting thing to be bringing back to NetGain. Anything that really surprised you in this work?

Elizabeth Hansen-Shapiro:

Yeah. I think that, I think I was surprised right along with you about that tension between privacy advocates and researchers. I think it’s those kinds of values conflicts that I think are really driving this, and I think the kind of business model, and profit motivation, and kind of like risk tolerance of platforms for giving data access, it’s still there, but I think it would be a very important and consequential contribution of this paper if we can help shift the conversation around these issues to the tension between privacy and the kind of like social good of general knowledge that researchers, and journalists, and activists stand on top of, because I think if we can make progress on that kind of values conflict, then we have a shot at getting the rest of it in line.

Ethan Zuckerman:

This is ultimately a very hopeful show that we do here. We sort of bring people on to sort of imagine positive futures. What was maybe your happiest moment in writing this? Were there scenarios or sort of futures that you looked at and said, “Wow, maybe that actually will make real progress for us”?

Ethan, I am a pragmatic researcher and I pride myself on being practical, but I’m also at heart like a dreamer and an optimist, and I have to say the most exciting conversations I had were around this common carrier reclassification, because I think it would be such a game-changer not just for these issues, but for all of the kind of other issues that are attached to this one that have to do with platform regulation, and that would be the kind of like era-defining shift that would really change the terms of debate, this issue on a whole host of others, so for me, that was the most hopeful. It was hopeful because, I think the way one policymaker framed it to me was like, “We have this legislation.” It’s not like we have to write something new. These powers exist. It’s a category definition shift that we would have to pull off, but it’s not as if we have to figure out an entirely new, complicated, intellectual regime for getting us the rule-making and regulatory powers that we would want.

Ethan Zuckerman:

Elizabeth, it’s a very special sort of optimist for whom a new government regulatory bureaucracy is the optimistic outcome. I honestly came out of this process just like more in love with Julia Angwin basically. This notion that what we need are better tools for individuals to get together and donate data, we haven’t talked much about this yet, but Mozilla is working on a project that allows people to donate browser data that is working at just an incredible scale. That gave me an enormous amount of hope coming out of this. It’s an incredibly complex issue.

It’s a real thicket, as you’ve used the term. What are your aspirations for the report? In a perfect world, who reads this and what comes out of it?

Elizabeth Hansen-Shapiro:

Yeah. Well, I’m excited for our funder community to read it because, I think one area that we didn’t probe in the report, and actually now that we have another week or so to make final edits, we might want to consider. Funders exert a lot of influence in this space, not only through the problems and people that they choose to fund, but also the terms on which they choose to fund them, and so I think one immediate and important access would be that funders themselves take the risk of building in some kind of best practices into their grant agreements around this problem of privacy versus access, and I think that would be a really important stake in the ground, and that would be a good use of their symbolic and funder weight to do that. I hope they read this and take away that there’s many fronts on which they can push forward immediately.

Ethan Zuckerman:

Since you’re editing the conclusion, that sounds like something that you could add in here.

Elizabeth Hansen-Shapiro:

Definitely.

Ethan Zuckerman:

Look, one of my great joys as a researcher is getting to work with just extraordinary other figures out there. Dr. Elizabeth Hansen-Shapiro is one of those extraordinary figures. We’re really looking forward to having this report come out. We would be remiss if we didn’t point out that our friend, Mike Sugarman, who’s producing this podcast was the lead researcher on all of this. We should also mention that our good friend, Fernando Bermejo and Lorrie LeJeune have both been involved with sort of reading and editing as we’re getting this out. This report will be out at the same time that we put out this podcast.

We will add some URLs to it. Elizabeth Hansen-Shapiro, you are someone that we look forward to having back often because you’re working on so many different things. Thanks for being with us today.

Elizabeth Hansen-Shapiro:

Thank you, Ethan. It’s been a real pleasure to work on this important topic with you and to be on the show.

Transcript

Comments

Leave a ReplyCancel reply