So you’ve got Bitcoin fever, and you’d like to understand more about how it works. What exactly is contained in the underlying blockchain?

This series will have 3 parts:

  • Installing Bitcoin and Jupyter Notebook
  • Installing BlockSci and Converting Data for Analysis
  • Analyzing Blockchain Data

So what is Blockchain?  Let’s consult Wikipedia…

A blockchain, originally block chain, is a continuously growing list of records, called blocks, which are linked and secured using cryptography. Each block typically contains a hash pointer as a link to a previous block, a timestamp and transaction data. By design, blockchains are inherently resistant to modification of the data. It is “an open, distributed ledger that can record transactions between two parties efficiently and in a verifiable and permanent way”. For use as a distributed ledger, a blockchain is typically managed by a peer-to-peer network collectively adhering to a protocol for validating new blocks. Once recorded, the data in any given block cannot be altered retroactively without the alteration of all subsequent blocks, which requires collusion of the network majority.

Blockchains are secure by design and are an example of a distributed computing system with high Byzantine fault tolerance. Decentralized consensus has therefore been achieved with a blockchain. This makes blockchains potentially suitable for the recording of events, medical records, and other records management activities, such as identity management, transaction processing, documenting provenance, food traceability or voting.

The first blockchain was conceptualized in 2008 by an anonymous person or group known as Satoshi Nakamoto and implemented in 2009 as a core component of bitcoin where it serves as the public ledger for all transactions. The invention of the blockchain for bitcoin made it the first digital currency to solve the double spending problem without the need of a trusted authority or central server. The bitcoin design has been the inspiration for other applications.


To find out more about the blockchain, we’ve got to get the data somewhere we can take a look at it. This requires setting up a Bitcoin node and synchronizing it with the Bitcoin network. Synchronizing a full Bitcoin node will require about 180G of disk space. Instructions for installing and configuring a full Bitcoin node can be found here. I’m using Ubuntu 17.04 for my work and didn’t have any difficulties installing or running a full node. Once you’ve got the node up and running, it will take some time, depending upon your Internet connection, to fully synchronize (get all the blockchain data). Once your node is fully synchronized, then we’ll convert the data so that we can analyze it in a Jupyter notebook.


If you don’t already have Jupyter installed, here’s a link that will walk you through it. You’ll need to make sure to install Python 3 – it’s required by some additional software later on. I use Anaconda and will also provide some additional instructions later on how to install some required libraries using conda.

In the next installment, we’ll install BlockSci and convert the Blockchain for analysis.

Stay tuned, and thanks for reading!