In my last post in this “Cryptocurrencies Explained” series, we analyzed traditional currencies and banking systems and their drawbacks. In doing so, we found that the traditional systems are highly centralized and in turn limited in terms of security, equity and efficiency because they rely on trusted centralized authorities and middle-men. Yet as I explained in my last post, creating a fiat currency that acts as a medium of exchange, unit of account, and store of wealth is quite a challenge without a centralized bank or trusted authority. For example, if you don’t have a single entity in control, how do you create and verify transactions? How do you record transactions so that it can’t be changed in the future? How do you know that everyone has the necessary information about everyone else?
These questions were addressed for the first time back in 2008 in a white paper published anonymously under the pseudonym “Satoshi Nakamoto” that introduced the world to a completely decentralized digital cryptocurrency called Bitcoin. Satoshi’s seminal white paper is definitely worth a read because it laid the foundation for Bitcoin and the hundreds of cryptocurrencies that followed. However, since the paper is quite concise and is intended primarily for a technical audience, I’ll try to explain how exactly cryptocurrencies like Bitcoin work as simply as possible. I could probably break down the Computer Science, Cryptography, Mathematics and Economics that cryptocurrencies use to address each of these questions independently, but I think it will be a lot easier to understand how all the pieces of the puzzle fit in if we just went through everything that goes on behind the scenes between the initiation and completion of a cryptocurrency transaction. Before we move on, I recommend that you read the last post (if you haven’t already) because I will be making some analogies between cryptocurrencies and the traditional systems discussed in that post along the way.
Initiating a Transaction
Cryptocurrencies operate on a network of computers that use a networking architecture called peer-to-peer or P2P. P2P networks lend themselves to decentralized distributed computing because every computer connected to the network or node can share and/or consume computing resources at the same time. P2P networks have quite a lot of nuances and applications, but for now, all you need to know is that every node on a P2P network can communicate with every other node on the network.
With cryptocurrencies, every user that wants to send, receive or store that currency is connected to the currency’s P2P network as a node and is identifiable by an address consisting of 26-35 alphanumeric characters like 3FZbgi29cpjq2GjdwV8eyHuJJnkLtktZc5. Moreover, every node on the network has their own copy of a digital ledger that contains all previous transactions that have occurred on the network between all nodes. All this information is stored in a data structure called blockchain, but more on that later on. Since every node has a copy of this digital ledger, there is no single point of failure unlike with the traditional system, where only a trusted authority like the bank would have this information. Therefore, even if a node on the network shuts down or is compromised due to a security vulnerability, the cryptocurrency will not be affected.
So let’s imagine that person A wants to send 100 bitcoin or some denomination of some arbitrary cryptocurrency to person B. To initiate this transaction, person A would broadcast a message containing the details of the transaction across the network. This message would include person A’s and person B’s addresses on the network, a unique transaction identifier, the amount being transferred, the amount provided as the transaction fee, and a digital signature. Aside from the digital signature, all this other data is pretty standard and you would probably see it in transactions on the traditional banking system. But because we don’t have a trusted centralized authority like a bank to verify that A really wants to send money to B, we run into a problem. How can the other nodes on the network validate this transaction and be sure that person B or someone else on the network is not sending this message without A’s knowledge?
The solution lies in the included digital signature, so what exactly is this digital signature? In addition to their address on the network, every user has two more unique pieces of data called their public key and private key – these are basically just two very large prime numbers that are related to one another. As the names suggest, every user’s public key is known by every other node in the network, but only the user knows his/her private key. The digital signature is generated using a cryptographic hash function. What is a cryptographic hash function? Think of it as a function that takes any kind of input and gives an output of fixed length (usually 256 bits long.) This function is such that the output is non-invertible – meaning that the input cannot be determined from the output – and deterministic – meaning that a particular input will always produce the same unique output regardless of device, operating system, location or anything else. So, before person A broadcasts the message across the network to initiate the transaction, he/she uses a cryptographic hash function(sign function below) that takes the message, the transaction ID, and his private key as inputs and outputs the unique fixed-length digital signature. Although the cryptographic hash function is non-invertible, it comes with another function that allows other nodes on the network to quickly verify the digital signature. This verify function takes the message, signature and person A’s public key and outputs true if the signature is valid and false if the signature is invalid. Now all nodes know that this transaction is really coming from person A because only person A has access to his/her private key. Moreover, person B (or anyone else) can’t just rebroadcast this message because the transaction ID would be different compared to the one used to generate the digital signature. I won’t go in-depth into the mathematics behind these cryptographic hash functions because this is all we need to know about them to understand how cryptocurrencies work, but they’re very interesting and very useful.
Sign Function(Message, Transaction ID, Private Key) = Digital Signature
Verify Function(Message, Transaction ID, Digital Signature, Public Key) = True or False
Although every node on the network can now validate that this transaction has been initiated by person A, how do we know that person A has 100 bitcoin to send? This is called the double-spend problem. In the traditional system, a trusted bank would check A’s account balance to avoid this problem, but with cryptocurrencies, every node would use their copy of the digital ledger of transactions to look back on all transactions that person A was involved in and calculate his/her current account balance accordingly. If person A’s wallet balance is more than 100 bitcoin, and the digital signature is verified by the nodes on the network, then the transaction is added to the transaction pool. At this point, the transaction has been validated, but it has not been confirmed and added to the blockchain or digital ledger – meaning that it can still be reversed.
Adding Transactions to the Digital Ledger
Thus far we have worked with the assumption that every node on the network has an accurate copy of the digital ledger of all transactions. In other words, we have assumed the presence of distributed consensus. But this is a big assumption; how exactly can we ensure distributed consensus? We’ll be able to answer this question by looking into how person A’s transaction, which is now in the validated transaction pool, is confirmed and added to everyone’s digital ledgers.
Within the cryptocurrency’s P2P network there are some specialized nodes called miners, who are responsible for confirming transactions. They begin by grouping unconfirmed transactions from the transaction pool into candidate blocks – the number of transactions in a candidate block differs based on the cryptocurrency. Remember how person A specified a transaction fee in his message? This gives miners an incentive to put person A’s transaction into a candidate block and confirm it fast; but in general, the higher the transaction fee, the quicker the confirmation.
After a miner groups transactions into a candidate block, they generate a merkle tree by repeatedly running a cryptographic hashing algorithm. The image above illustrates how this would be done in a block with 8 transactions. First, the transaction data corresponding to each of the 8 transactions are hashed using a cryptographic hashing algorithm; this gives you 8 different outputs (G – N.) These 8 outputs are grouped into 4 pairs of two and used as inputs in the hash function again to get 4 outputs(C – F.) This process is then repeated twice with these 4 outputs and then the next two outputs, until we have just a single hashed output called the merkle root. Why did we do this? Cryptographic hash functions are collision resistant – this means that every unique input gives a unique output; even changing the input by a single character completely alters the resulting output. Therefore, building a merkle tree by recursively running a cryptographic hash function with different inputs allows us to condense all the transactions into a single string of characters – the merkle root! This helps us verify the authenticity of this block later on; if someone were to change even a single piece of transaction data, the merkle root will completely change and not be the same as the original.
Now that the miner has grouped some transactions(including A’s) into a candidate block and has found its merkle root, he/she adds a portion of data to the block called the block header. As seen in the image above, this block contains a bunch of information about the block itself including its merkle root, size in bits and timestamp. The header also includes a piece of data called the previous block hash. As seen in the image below, the previous hash is obtained by hashing the contents of the last confirmed block – that’s in everyone’s digital ledgers – using a cryptographic hashing function. This piece of data links every block to the block that came before it and structures the data in the digital ledger into a chain of blocks or blockchain. Moreover, notice that the previous block’s merkle root would also be part of the input used to generate the previous block hash. Therefore, if someone were to go back and change a merkle root or some data in a confirmed block, then all the blocks that came after it would become invalid since the previous hash stored in each of their headers would be different.
At this point the miner has also calculated the merkle root, found the previous block hash and added a block header to this candidate block, but the block has still not been confirmed and broadcasted to everyone else on the network. To confirm the block, the miner has to find what’s called a proof-of-work. This is done by changing the nonce(number used once) value in the block header such that the hash of the entire block including the transactions, merkle root, nonce, and previous block hash starts with a certain number of zeros and is less than a certain target value. Since the cryptographic hashing algorithms used are irreversible, trial and error is the only way to do this.
When a miner finds an appropriate nonce, the block is broadcasted over the network to all the other nodes. Every node then verifies the proof-of-work by hashing the entire block including the nonce and checking that the block hash is less than the target value. If the verification is successful, the block is added to their copy of the blockchain and A’s transaction, which was placed in the block is confirmed. In addition to the transaction fee that the miner earned from A’s transaction, the miner is also rewarded with a certain amount of cryptocurrency for confirming the block – hence, the name miner. In a sense, cryptocurrency mining is like a lottery because all the miners guess random numbers until one miner figures it out and gets rewarded for doing so. This is the only way in which the supply of cryptocurrency grows. The block reward starts off high to incentivize miners to join the network early on, but it is gradually reduced with time to ensure that the cryptocurrency remains scarce and in turn valuable.
Since trial and error is the only way to confirm blocks, and guessing the right nonce depends on how many nonces you can try, miners often invest in a lot of Graphics Processing Units(GPUs) that increase their computing power and allow them to hash multiple nonce values parallely. Since these GPUs consume a lot of electricity, miners are essentially converting electricity into cryptocurrency when they mine. Moreover, most cryptocurrency protocols alter the difficulty of confirming a block by adjusting the target value in order to ensure that confirming a block takes a fixed amount of time. With Bitcoin for example, the difficulty is adjusted to ensure that a block is mined every 10 minutes. This time limitation is in place in order to keep the blockchain from getting too large too fast and to avoid wasting energy by accounting for network latency or the time it takes to propagate information through the P2P network.
At this point A’s transaction has been confirmed and added to every node’s digital ledger, but there are still a few unanswered questions. What happens if two miners solve a proof-of-work and confirm a block at the same time? If this were to happen, the blockchain would branch out and create a fork as seen above. However, since we can’t have two separate parallel blockchains, the longest branch in the fork is retained as the “official” one and the other is orphaned. This raises another question: what happens if a crooked miner created a fraudulent branch and mined new blocks onto that branch faster than everyone else so it became the official blockchain. A crooked miner might buy thousands of dollars worth of goods with a cryptocurrency, put that transaction on the blockchain and then execute her attack by building a new chain longer than the official chain so that the official blockchain along with her fund transfer gets thrown out and she ends up paying nothing for all the goods she received.
This is theoretically a possibility, but practically quite unrealistic because the probability of a single miner solving multiple blocks before all the other miners is very low. The only way that this could happen is if a single miner controlled more than 51% of the total computing power on the network, in which case she would be able to mine faster than everyone else and take control of the cryptocurrency. A 51% attack like this is possible, but probably prohibitively expensive for a single miner to buy enough GPUs to gain 51% of the total computing power on the network. Because of the proof-of-work needed to confirm blocks and add them to the blockchain, cryptocurrencies are extremely resilient against most attacks that plague most cyber-infrastructure.
Now that we’ve understood the intuition behind cryptocurrencies and the ways in which they address the challenges of building a completely decentralized currency, the next few posts will focus on evaluating them. This will include analyzing more of the economics of cryptocurrencies and understanding why they haven’t caught on as much as expected. In the meantime, if you have any questions about any of this you can message or tweet at me on Twitter and I’ll try my best to clear things up.