Transparency collective publishes 70 gigabytes of data hacked from right-wing social media network
NYU researchers believe that GabLeaks data will reveal insights into right-wing, fascist and white supremacist politics and organizing but are concerned about the ethics of using hacked private information.
March 8, 2021
The nonprofit transparency collective Distributed Denial of Secrets (DDoS) released 70 gigabytes of hacked user accounts, passwords, direct messages and public and private posts from the right-wing social media network Gab on Monday, March 1. The GabLeaks was procured by the hacktivist JaXpArO (they/them) & My Little Anonymous Revival Project. DDoS published the leak as a limited distribution dataset available to researchers and journalists upon request.
Maxwell Aliapoulios, a PhD student and researcher at the NYU Center for Cybersecurity, received advance access to GabLeaks and briefly reviewed the data. However, academic researchers are questioning the ethicality of using parts — or any — of the data in academic studies.
DDoS gained notoriety last summer after publishing BlueLeaks, a dataset containing 269 gigabytes of hacked law-enforcement files. Wired reported that the nonprofit released ransomware hacked from corporations, posts scraped from Parler and the .Win Network and data stolen from corporations in Myanmar after the military coup d’état. DDoS typically releases datasets publicly, but published GabLeaks as a limited distribution dataset because of the large amounts of personally identifiable information contained within.
“The mission of transparency and reliability are at the core of this and other publications,” Lorax B. Horne, the director of DDoS, told WSN in a text message. “We distinguish ourselves from other data collectors and archives in our commitment to announce what we have, and to make some of these coveted data properties available to researchers that are believed to work in the public interest.”
Horne added that they generally put datasets on the limited distribution track when they contain too much personally identifiable information to be worth their time to redact from non-relevant parties.
DDoS might make a version of GabLeaks publicly available in the future, according to Horne. However, when and how DDoS releases datasets will depend on the source’s needs and wishes.
“We are working on protocols to put in place that could redact any dataset,” Horne wrote. “What we do with a particular dataset, usually responds to source direction. So it could be this Gab dataset, or a future release of Parler or something else.”
Aliapoulios, who researches cybercrime and online extremism, co-developed the Social Media Analysis Toolkit, a website that allows users to search keywords on and access data from fringe, alternative and mainstream social media sites such as Twitter, Reddit, 4chan, Parler and Gab. He does not have academic plans to research GabLeaks because the data is not publicly available.
When Aliapoulios glanced at the dataset, however, he noticed a pattern. The number of new Gab users surged in early January — around the time Twitter banned Trump and Amazon deplatformed Parler — leading him to conclude that Twitter and Parler users migrated to Gab.
“There was a massive, massive amount of new users signing up, hundreds of times more than the normal new-user count,” Aliapoulios said. “But what I found interesting was that those new users didn’t participate. So they joined, but the overall post count didn’t change.”
Using Parler as a case study, Aliapoulios said he is interested in studying the effects of deplatforming extremism, to learn whether deplatforming pushes extremists to deeper, darker places on the internet or to other, similar social-media networks like Gab.
GabLeaks could be useful for historians and social scientists mapping social networks and studying how groups organize online, A.J. Bauer said, an assistant professor at the University of Alabama’s Department of Journalism and Creative Media and a former visiting assistant professor at Steinhardt’s Department of Media, Culture, and Communication.
Gab is known as a haven for neo-Nazis and white supremacists, according to Bauer, who primarily researches conservative and right-wing media. It is smaller and more niche than Parler or Twitter, he said, which might make it easier for white supremacists to find chats about tactics or methods of winning influence.
Private chat logs reviewed by WSN appear to show users discussing relationships, family life, follower counts, Christianity, QAnon, the riot at the Capitol on Jan. 6, the difficulty of finding Aryan women with which to breed and the best tactics for forming a fascist party in the United States, among many other things.
“If I’m a historian in the future and I want to look back to the present moment, maybe I see similarities between two different fascist groups and I’m wondering whether they were in contact,” Bauer said. “[GabLeaks] may be evidence that says, ‘Well, actually they were talking with one another, and here’s what they were saying.’”
GabLeaks could also be useful for scholars researching radicalization, according to Bauer.
“Maybe there’s some user who’s kind of disconnected from far-right organizing, and you can kind of watch their gradual radicalization through their messaging and comments and posts,” he said.
However, Bauer is concerned about the ethics of using GabLeaks for research or for academic studies. It presents severe violations of privacy and consent, he said. He wouldn’t feel comfortable using the data for research without the consent of the people he’s researching.
“If you’re using without consent, you would at least need to be engaging in a lot of anonymizing of the data, so that you’re getting the insights you want, but without exposing whoever the person is,” Bauer explained.
Justin Hendrix, an adjunct professor at the Tandon School of Engineering, who studies disinformation and media manipulation, echoed Bauer’s sentiments. Hendrix researched and reported on publicly available data scraped from Parler, but has no plans to look at GabLeaks. Parler was crawled and only public information was taken, he explained, but Gab was hacked and both public and private information was extracted.
“With the Gab data … there are going to be some questions that folks have to ask about what are the ethics of looking at this piece or the other,” he said.
One piece consists of usernames, passwords and other personally identifiable information, according to Aliapouilios. The other piece consists of public data that could have been extracted with a crawler. The former portion is a no-go for him. Yet Bauer said he sees the purpose and political benefit of the GabLeaks release regardless of whether or not researchers can use any of the data for academic studies.
“It’s making it harder to be a fascist,” he said. “If you’re on Gab or something like that, you know that your words aren’t private. That might chill speech in some way, keep you from organizing in some way.”
A version of this article appeared in the Monday, Mar. 8, 2021, e-print edition. Email Trace Miller at [email protected]
Correction, March 8: A previous version of this article omitted part of JaXpArO (they/them) & My Little Anonymous Revival Project’s pseudonym. The article has been updated and WSN regrets this error.