Wednesday, November 3, 2021

Bitcoin – needle in the haystack technology

“Entropy is what makes a bitcoin your bitcoin”

https://preview.redd.it/ha0j1vudubx71.png?width=2356&format=png&auto=webp&s=39ab3fcd2fb81447001e6f95e60c9b9fdff420f8

Your private key is a needle in the haystack.
Well actually that’s not true, your private key is a really big and random number.

So 1st of all why is it a number? Well because everything that’s on a computer is a number.

Ok, but why does it have to be big?

Well, let’s say my private key is a number between 1 and 10. If you want to guess it, you have a 1 in 10 chance of guessing it. Which is pretty bad for me.
And if you’re a bit zealous it would only take you 10 tries to go through all the possibilities to guess my private key.

Now let’s say my private key is a number between 1 and 115 quattuorvigintillion, things are a bit different.
What is 115 quattuorvigintillion?
That is 2256, and this is the upper limit of the number I was talking about and to put it more into perspective this is how many atoms there are in the visible universe.

So if you would want to go through all the possibilities of all the private keys to guess my private key that would take you about 500 years.
And that is great because this process is very time-consuming, like finding a needle in the haystack.

So your private key is like the needle in the haystack, more precisely the position of the needle in the haystack.

If you put your needle at the top of the haystack, anyone can find it.
If you put it in the corner of the haystack, someone might look there and also find it.
So if you put in a position that is pretty obvious, someone will guess it with ease. 

But if you would put it in a random position, the only way someone could guess the position of your needle(aka your private key) is by going through all the hey in the haystack, which could take a long time.

Luckily you don’t have to worry about this and your wallet takes care of all these things.

RNGs – Random Number Generators

Your Bitcoin wallet is the instrument you use to interact with the Bitcoin network.
It has 3 main functions:

  1. It generates the private keys
  2. It stores the private keys
  3. It signs transactions

And depending on what specific wallet you may be using it may include other functionalities.

I’m sure you learned by now from your past relationships that it does not matter how hard you try in the present or even if she’ll do anal once a week if the whole thing had a bad beginning.

The same holds true for cryptography, and more precisely for the security of your Bitcoin keys.
If your private key was not generated randomly[1] and privately, there is a decent chance you may lose all your coins, and even more, it can nullify future efforts you put into securing your Bitcoin.

The private part is up to you, but the random depends on the software and hardware you trust.
So what’s so hard?

Well, computers are deterministic machines, meaning they will do the same thing if you give them the same input.
Example.
When I type in xhamster.com in my browser, it takes me to xhamster.com every time[2].
This is great because this is why we use computers in the first place because they are reliable, but this means they are NOT capable of producing random outputs.

Yeah, but cryptography is not only used to secure Bitcoin, in fact, its used all over the place, so clearly we must have solutions that provide use random numbers.

Indeed.
We have, and they are called Random Number Generators.

Of which there are of 2 types:

  1. Pseudo RNGs
  2. True RNGs

PRNGs – Pseudo-Random Number Generators

https://preview.redd.it/pstv36fnubx71.png?width=672&format=png&auto=webp&s=0ad02da0cbc40a3b1a373b4040f600dfaa20c8de

The Pseudo ones are like fake boobs.
They may look real for a distance, but on close inspection, it’s obvious they are fake.

These are pretty much an algorithm, a piece of software that spits out multiple numbers.
These numbers are uniformly distributed, and if you have a few of them you can NOT guess future ones.
Uniformly distributed means that if you chart them they are all over the place, and dont concentrate in one area.

Remember the needle in the haystack.
We dont want possible private keys to show any specific patterns or predictability.

Numbers from a Pseudo-Random Number Generator plotted.

They are useful as they are fast and reliable, and they are used all over the place.
The operating system on your phone and computer uses them all the time.

So the way these pseudo RNGs work is that they are a function(recursive most of the time) that needs an initial term T0.
The same way the Fibonacci sequence needs the 1st term.
Depending on this term you will get different sets of uniformly distributed numbers.

But remember this is a deterministic process that runs on a deterministic machine, so if someone gets this initial term, they will all be able to generate the same numbers as you, and possibly guess your current and any future private keys you may generate.

Ok, so where do we get this very important 1st term from?

TRNGs True Random Number Generators

The answer is True RNGs.

https://preview.redd.it/ehljcxb0vbx71.png?width=522&format=png&auto=webp&s=26af5bf0549c51dcadbcb3eae00556575a1cc20a

What makes these ones "true" as opposed to "pseudo", is the same thing that makes boobs real as opposed to fake.
The fact that they are natural.

Tru-RNGs are hardware-based and digitize chaotic events from nature in order to produce random numbers.
They often focus on physical phenomena like atmospheric noise, magnetic noise, or electromagnetic or quantum phenomena like thermal noise or avalanche noise.

The numbers outputted by a TRNG will also be uniformly distributed, lack patterns, and are totally unpredictable, but there are 2 differences.
There is no T0 and there is no algorithm behind them.

If we take 2 data sets one from a PseudoRNG and one from a TrueRNG you can’t distinguish between the two.
The only difference is that one is random and one is not.

As John von Neumann famously put it.

https://preview.redd.it/6ottvag3vbx71.png?width=501&format=png&auto=webp&s=e68a966a2d38d2528065f98a1c878e23d9188434

As with anything in life, there are tradeoffs, and even though the true ones are truly random, they are usually slower, and if not slower very expensive.
And as with any physical device living in the physical world, it can break down and/or become unreliable.

So are the true random number generators better than the pseudo-random number generators?

Well is a wife better than a table?
Even though one is more useful than the other, the question does not make sense.

https://preview.redd.it/0ecrsk36vbx71.png?width=1002&format=png&auto=webp&s=09bcb6b1d4bc1ebb852c19d68e9e7b0426cd05d6

Entropy

You can’t measure randomness.
You can observe a process and understand how it works and draw the conclusion that it is random or not.

But considering we are talking about numbers, after all, can’t we measure the numbers?
Of course, we can say 8008 > 1234 but saying that 8008 is more random than 1234, would not make any sense.

Whenever the topic of randomness is mentioned, entropy comes up very often, and depending on the context it can mean different things.

The physics context

The term comes from Thermodynamics and it has to do with measuring molecular randomness, or in how many ways you can arrange the tiny things that make up something.
And is used to express the 2nd Law of Thermodynamics.

“The entropy of isolated systems left to spontaneous evolution cannot decrease, as they always arrive at a state of thermodynamic equilibrium, where the entropy is highest.”

The information theory context.

In 1948 Claude Shanon((Who was a crazy motherfucker! )) wanted to measure information and published a paper called “A Mathematical Theory of Communication”, which pretty much give birth to this field.
He popularised((But not come up with as he "stole" it from John Tukey)) the bit as the most minimal unit of information.

A bit can either be a 0 or a 1, something or nothing.

He started thinking about how many questions does he need to ask in order to guess all the digits of a string.
Let’s say we have an 8 digit long binary number. Binary means that each digit can either be a 0 or a 1.

So how many questions do you need to ask me in order to guess my number?
Well for the 1st character one question.
You ask me is it a 0, and if I say no, you know it’s a 1.
What about the 2nd character, the same.
So in order to guess all the 8 characters, you need to ask me 8 questions.
So the entropy of the string is 1 bit per character times 8 characters = 8 bits.

This is true only if you dont have any information about my string, let’s say you somehow found out that 4 of the digits are 1s, then you would require fewer questions.
And of course, this gets more interesting and complicated when we have more options for each character, but for Bitcoin private keys, we keep everything in binary form.

Also in the paper, he estimated that the entropy of written English is between 0.6 and 1.3 bits per character, which is very low, and this is why it compresses so well((The higher the entropy is in a bitstring the less it can compress. Kinda makes sense as there are no patterns.)).

So higher entropy would mean more secure, no?
Well, not necessary.
This does bring us closer to what we are interested in but not quite there.

The cryptographic context.

You see Cryptography is just adversarial math.
Meaning that we always frame stuff by how hard it’s for an adversary to guess the secret or alter data.

The way modern((Modern meaning after 1883 after Kerckhoffs formulated his 6 principles)) cryptography works is that everyone knows the algorithms we use, but none one knows the entropy.
Like in Bitcoin everything is open-source, we know how every little thing works, and that’s great because that’s why we also trust the code because is auditable, but everyone’s, private keys are private and entropic(hopefully).

So good entropy would mean it’s hard(it will cost a LOT of resources) for an adversary to guess your secret, or in our case the Bitcoin private keys.

Good entropy has 3 characteristics:

  1. Unpredictability, which is a measure of how strong the non-computability of the bits in the sequence is;
  2. Uniform distribution of the bits in the sequence;
  3. Lack of patterns in the sequence.

It is worth pointing out that 3 implies both 1 and 2.
However, 1 does not imply 2.
And similarly, 2 does not guarantee 1.

But not all entropy is equally important.

Some people use entropy to make simulations of the universe(or for video games) and they don’t need cryptographically secure entropy, because they dont have any adversaries to worry about.

So what is the actual difference between Shanon’s entropy and cryptographic entropy, seem to be the same?

Well, it is not.

For example, we can take the very well-known constant Pi.

Pi is a number that goes on forever, and to date, we calculated about 62.8 trillion((To put it into perspective this is 3 times more than there are blood cells in the human body.)) digits of it.
The sequence of digits of Pi has all the proprieties listed above, but if we use that as entropy for our private keys, it’s trivial for anyone to guess.

The key difference is that generating the digits of Pi is not a random process, and it can be replicated by anyone with a computer, so from the adversarial POV, even though the entropy checks all the boxes, it would not serve as cryptographically secure entropy.

Ok, ok, but we are talking about Bitcoin here.
This can((will*)) be worth millions in the future, and I might even want to pass it on to other generations, so as entropy is so important what is the best entropy for my Bitcoin private keys?

Well, that sounds like a great subject for a future post.


No comments:

Post a Comment