Alyssa Blackburn, a data scientist at Rice University and Baylor College of Medicine in Houston, has spent several years performing digital detective work with her trusty lab assistant, Hail Mary, a shiny black computer with orange trim. She has been collecting and analyzing leaks from the bitcoin
blockchain, the immutable public ledger that has recorded all transactions since the cryptocurrency
’s launch in January 2009.
Bitcoin represents a techno-utopian dream. Satoshi Nakamoto, its pseudonymous inventor, proposed that the world run not on centralized financial institutions but on an egalitarian, math-based electronic money system distributed through a computer network. And the system would be “trustless” — that is, it would not rely on a trusted party, such as a bank or government, to arbitrate transactions. Rather, as Satoshi Nakamoto wrote in a 2008 white paper, the system would be anchored in “cryptographic proof instead of trust.” Or, as T-shirts proclaim: “In Code We Trust.”
The practicalities have proved complicated. Price turbulence is enough to induce the bitcoin bends, and the system is environmentally destructive, since the computational network uses exorbitant amounts of electricity.
Blackburn said her project was agnostic to bitcoin’s pros and cons. Her goal was to pierce the scrim of anonymity, track the transaction flow from Day 1 and study how the world’s largest cryptoeconomy emerged.
Satoshi Nakamoto had presented the currency as anonymous: For bitcoin transactions (buying, selling, sending, receiving etc.), users employ pseudonyms, or addresses — alphanumeric cloaks that hide their real identities. And there was apparent confidence in the anonymity; in 2011, WikiLeaks announced that it would accept donations via bitcoin. But over time, research revealed data leakage; the identity protections weren’t so watertight after all.
“Drip-by-drip, information leakage erodes the once-impenetrable blocks, carving out a new landscape of socioeconomic data,” Blackburn and her collaborators report in their new paper, which has not yet been published in a peer-reviewed journal.
Aggregating multiple leakages, Blackburn consolidated many bitcoin addresses, which might have seemed to represent many miners, into few. She pieced together a catalog of agents and concluded that, in those first two years, 64 key players — some of whom were the community’s “founders,” as the researchers called them — mined most of the bitcoin that existed at the time.
“What they figured out, just how concentrated early mining and use of bitcoin was, that’s a scientific discovery,” said Eric Budish, an economist at the University of Chicago. Budish, who has conducted research in this realm, received a two-hour video preview with the authors. Once he came to understand what they had done, he thought, “Wow, this is cool detective work,” he said. Referring to those early key players, Budish suggested that the paper be titled “The Bitcoin 64.”
Computer scientist Jaron Lanier, an early reader of the paper, called the investigation “important and significant” in its ambitions and social implications. “The nerd in me is interested in the math,” said Lanier, who is based in Berkeley, California. “The techniques used to extract information are interesting.”
The demonstration of blockchain leakage, he noted, will be surprising to some, not to others. “This thing isn’t hermetically sealed,” Lanier said. He added: “I don’t think it’s the end of the story. I think there’s further innovation that will take place, extracting information from these types of systems.”
One of Blackburn’s tactics was simple perseverance. “I kicked it till it broke,” she said, recalling how the principal investigator, Erez Lieberman Aiden, an applied mathematician, computer scientist and geneticist at Baylor College of Medicine and Rice University, characterized her method.
More precisely, Blackburn developed hacks for the period of time that was of particular interest: from the cryptocurrency’s start to when bitcoin achieved parity with the U.S. dollar in February 2011, which coincided with the establishment of the Silk Road, a bitcoin-based black market. She leveraged human lapses such as insecure user behavior; she exploited operational features inherent to bitcoin’s software; she deployed established techniques for linking the pseudonymous addresses; and she developed new techniques. Blackburn was particularly interested in miners, the agents who verify transactions by engaging in an elaborate computational tournament — a puzzle hunt, of sorts, guessing and checking random numbers against a target, in search of a lucky number. When a miner wins, they earn bitcoin income.
Whether 64 seems like a small or large number of key miners depends on one’s proximity to the crypto undertow. Scholars have questioned whether bitcoin is truly a decentralized currency. From Lieberman Aiden’s perspective, the population under investigation was “even more concentrated than it seems.” Although the analysis showed that the big players numbered 64 over two years, at any given moment, according to the researchers’ modeling, the effective size of that population was only five or six. And on many occasions, just one or two people held most of the mining power.
As Blackburn described it, there were very few people “wearing the crown,” functioning as arbiters of the network — “which is not the ethos of decentralized trustless crypto,” she said.
Finding Treasures in the Data
For Blackburn and Lieberman Aiden, bitcoin’s data — 324 or so gigabytes archived in the blockchain — presented a cache of temptation. Lieberman Aiden’s lab does biological physics and widely applied mathematics; one focus is three-dimensional genome mapping. But as a scholar, he is also intrigued by the use of new kinds of data to explore complex phenomena. In 2011, he published a quantitative cultural analysis using more than 5 million digitized books from 1800 to 2000, with Google Books and collaborators. “Culturomics,” he called it. For instance, the team introduced the Google Ngram Viewer, which lets users type in a word or phrase and observe its usage plotted over the centuries.
In the same spirit, he wondered what treasures might be submersed in bitcoin’s data lake. “We literally have a record of every single transaction,” he said. “These are remarkable economic and sociological data sets. Clearly, there’s a lot of information in there, if you can get at it.”
Getting at it proved nontrivial. Blackburn was barred from the university’s supercomputing cluster — with her file folder labeled “Bitcoin,” she was suspected of mining the cryptocurrency. “I objected,” she said. She said she tried to convince an administrator that she was conducting research, but “they were completely unmoved.”
A key tactic of Blackburn’s was to trace patterns in plots of numbers that in theory should have been random and meaningless. In one case, she was chasing the “extranonce,” one piece of the mining puzzle: a short field of 0s and 1s tucked within a longer string that encodes each block, or bundle, of transactions. The extranonce leaked information about a computer’s activity. This led Blackburn to reconstruct the miners’ behavior: when they were mining, when they stopped and when they started up again. She speculates that the extranonce’s leaky behavior was tolerated because it allowed bitcoin’s creator to keep an eye on miners; the source code was modified to plug this leak shortly before Satoshi Nakamoto disappeared from the public bitcoin community in December 2010.
Once Blackburn had put various toeholds to use — allowing her to erode the identity-masking protections — she began merging addresses, linking nodes on a graph, consolidating the effective population of mining agents. Then she cross-referenced and validated the results with information scraped from bitcoin discussion forums and blogs. Initially, the catalog of agents who mined most of the bitcoin tallied a couple of thousand; then it hovered for a while around 200. Ultimately, Hail Mary spit out 64. (Eventually, Hail Mary’s brains were incorporated into the lab’s computer cluster, Voltron.)
The study’s purpose was not to name names; it’s the job of the FBI and the IRS to bust bitcoin criminals. But the researchers pinpointed the identities of a couple of the top players who were publicly known bitcoin criminals: Agent No. 19 is Michael Mancil Brown, aka “Dr. Evil,” who was found guilty of a 2012 fraud and extortion scheme involving Mitt Romney, then a candidate for president. Agent No. 67 is associated with Ross Ulbricht, aka “DreadPirateRoberts,” creator of the Silk Road. Naturally, Agent No. 1 is Satoshi Nakamoto — whose true identity the researchers did not try to determine.
Mark Gerstein, a professor of bioinformatics at Yale University, found in the research implications for data privacy. He recently stored a genome on a private blockchain, which allowed for a secure and tamperproof record. But he noted that in a public setting, as with bitcoin’s blockchain, a data set’s size and subtle patterns made it susceptible to breaches, even as the data remained immutable. (Blackburn wasn’t tampering with the bitcoin blockchain’s records.)
“That’s the amazing thing about big data,” Gerstein said. “If you have a big enough data set, it starts to leak information in unexpected ways.” Even more so when data from different sources are connected, he said: “When you combine one data set with another to make a bigger data set, nonobvious linkages can arise.”
Once Blackburn had assembled the catalog of agents, she analyzed the income they had reaped from mining. She found that within a few months of the cryptocurrency’s introduction — and…