Mnemonic in Bitcoin
In this article, we learn how to generate mnemonics and how the mnemonic is converted into a seed that is used to derive public keys for different transactions in Bitcoin.
Table of contents.
- Introduction.
- Mnemonics.
- Generating Seed From Mnemonics.
- How to Generating Mnemonics.
- Summary.
Prerequisites
Introduction
In the prerequisites article, we learned about BIPs and their purpose in the Bitcoin blockchain. We learned how they are created, approved, and implemented. In this article, we learn about a specific BIP. BIP-39 which involves mnemonic code words used the represent a random number that is used to derive a deterministic wallet.
To get started with owning bitcoin, we need an interface(wallet) to the blockchain. Remember a wallet is just responsible for holding keys. These keys are what we use to interact with the blockchain where funds are stored. If we choose to download a software wallet, either a desktop or mobile wallet, a collection of words will be generated depending on the wallet. Note that we can choose to generate words on our own(brain wallet) which is not recommended since it is highly unlikely that the words will be random.
These words are kept safe from the public's eyes, and should also not be lost. If lost it will mean a loss of funds in the wallet since we can't access it.
Remember, a deterministic wallet has a single seed which is the source of all public keys used with transactions. The words we previously generated are used to generate a seed where we derive all other keys.
Mnemonics make it easier for wallet owners to back up their wallets since they are easy to read, write, import, and export without making mistakes compared to a random sequence of numbers. BIP 39 is responsible for the implementation of mnemonics and a seed.
We will discuss the steps involved in the generation of mnemonics and how the mnemonic is converted into a seed which we use to derive public keys for different transactions.
Mnemonics.
Mnemonics are generated automatically when we download a software wallet, this is defined in the BIPs-39. To generate mnemonics, the following steps are followed;
- A random sequence of bits is created, this is referred to as an entropy.
- A checksum of the random sequence is created. This is done by using the first bits of the entropy's SHA-256 hash.
- The checksum is added to the end of a random sequence of bits.
- The sequence of bits is divided into sections of 11 bits each.
- Each 11-bit value is mapped to a word from a predefined dictionary of 2048 words. (this is the mnemonic).
The table below demonstrates the relationship between the size of entropy data and the length of mnemonic code in words.
Generating Seed From Mnemonics.
Now that we have a list of words - mnemonics with a length of 128 - 256 bits, we will learn how this list of words is used to generate a seed with which we derive all other public keys.
This list of words is used to derive a 512-bit seed using a key stretching function known as PBKDF2.
The key stretching function, PBKDF2, take two parameters, the first is a mnemonic and the second is a salt. The latter is to make sure that building a lookup table that can be used in brute force attacks is impossible. In this case, we use the salt to allow the introduction of a paraphrase that will serve as additional security to protect the seed.
The process of generating a seed is as follows;
- The first parameter to the PBKDF2 function is obtained from the previous step.
- The second parameter is comprised is a string of constant mnemonic concatenated with an optional string that is provided by the user.
- The function stretches the mnemonic and salt parameters by use of 2048 hashing rounds using the HMAC-SHA512 algorithm. This results in a 512-bit value which is the seed.
The following image demonstrates the process of converting a generated mnemonic to a seed;
Optional paraphrases add a second factor that makes mnemonics useless on their own, in that, a hacker who obtains a mnemonic can't do anything harmful unless they also have the paraphrase. On the other hand, paraphrases introduce a risk, in that, if the wallet owner dies and nobody knows the paraphrase the funds remain locked in the blockchain forever.
How to Generating Mnemonics.
BIP-39 is implemented as a library in programming languages such as Python (Python Mnemonic), Javascript (BIP39 in BitcoinJS) and C++ (LIB Bitcoin).
Creating mnemonics, seed, and entropy
from mnemonic import Mnemonic
mnemo = Mnemonic("english")
words = mnemo.generate(strength=256)
seed = mnemo.to_seed(words, passphrase="secret_paraphrase")
entropy = mnemo.to_entropy(words)
print("WORDS \n", words)
print("SEED \n", int.from_bytes(seed, "little"))
print("ENTROPY \n", "".join(format(ord(byte), "08b")[::-1] for byte in str(entropy)))
Sample output;
WORDS
alcohol account electric enhance horror robust simple usage client dice click tonight discover canoe question future rent eager tourist injury outer swallow access cross
SEED
10102912926365222674410504999629897990468434966437955991561473484405950326058992619933382504892807706463555993631863657706194860219465320811332523868262004
ENTROPY
01000110100111100010111010100110100001100100111001001110100001101001111000010100010001101110010000111010000111100000110001101100001110100001111000001100000011001000110000111010000111101000110000100110001110100001111010000110101011000110001000111010000111100010011001000110011011100011101000011110010001100100110011100010001111100101010000111010000111101100011011101100001110100001111010000110110001100011101000011110100001101000011000111010000111100110011001001100110000100011101000011110101001101010011011000010110101000011101000011110101001100100110000111010000111100110011000101100001110100001111001000110011011000001001000111010000111101001110001100110001110100001111010011100111011000011101000011110010001101000011000111010000111101000110010011100001110100001111000100110101011000011101000011110010001100110110000111010000111100000110000001100100010101110010010010100
Summary
For a Bitcoin blockchain participant to remain anonymous, he/she needs to sign transactions with different public keys every time.
The process of generating mnemonics can be summarized as follows, a wallet starts from a source of entropy, adds a checksum, and finally maps the entropy to a word list.
A seed is used in a deterministic wallet for key derivations. All paraphrases are valid and lead to different seeds every time.
A key-stretching function and 2048 hashing rounds with the HMAC-SHA512 algorithm prevent brute force attacks against the mnemonic, in that, a determined hacker has to try about 2^512 combinations in order to crack the code.
With this article at OpenGenus, you must have the complete idea of Mnemonic in Bitcoin.