Thursday, October 29, 2020

test

Subject: A breakdown of Bitcoin "standard" script types

When challenged recently to provide an little known bitcoin fact, I presented that "Addresses are not stored anywhere in the blockchain". This got me thinking a bit more about the bitcoin OP codes and the scripting language they describe. There is a good wiki article on it all as a refresher. It's basically a stack based language similar to Forth or RPL language. Here's an example of a Mancala game I wrote in RPL to show more complex code.


Pay to Pubkey

The original bitcoin client defined two fields scriptSig and scriptPubKey which each contained half of the script needed to validate a transaction. The two scripts were concatenated togeather to create a complete script. Here's an example of a Pay to Pubkey script

P2PK size script
scriptSig 72 <sig>
scriptPubKey 35 <pubkey> OP_CHECKSIG
assembled <scriptSig> <scriptPubKey>
btc_address b58_encode(pfx + hash160(spk[1:34]))
Test len(spk) == 35 and (spk[0:1] + spk[34:35]).hex() == '21ac'
Total vB 107 72 + 35

Since the OP_CHECKSIG operation takes two arguments, this can be interpreted as txin.OP_CHECKSIG(<pubkey>, <sig>) from a non-stack based language perspective. In regards to TXN size, the total size of one of these assembeled scripts is 107 vB (bytes). In regards to bitcoin addresses, the address is derived by chopping off the first and last bytes (op codes) from the scriptPubKey (spk) then performing a Hash160 operation on the data. The script is recognized by it's length and the first and last op codes (OP_PUSH, OP_CHECKSIG).

In the original client P2PK was used for what was termed "Pay to IP". In this process, you would enter an IP address in the PayTo field, and the client would connect to the remote node to receive a scriptPubKey from them.


Pay to Public Key Hash

Along with P2PK, the original client also supported P2PKH termed "Pay to address". Since addresses were always stored as the Hash160 of the pubkey, this format had the advantage of requiring no secondary piece of information. All the sender need was the bitcoin address, where as in P2PK the sender needed the pubkey and could derive the address. But pubkeys are long and generally no checksumed like bitcoin address notation is. Having send only need a small checksumed hash was simpler and became much more widely used, although it does require scriptSig making it more expensive to spend

P2PKH size script
scriptSig 106 <sig> <pubkey>
scriptPubKey 25 OP_DUP OP_HASH160 <pkHash> OP_EQUALVERIFY OP_CHECKSIG
assembled <scriptSig> <scriptPubKey>
btc_address b58_encode(pfx + spk[3:23])
Test len(spk) == 25 and (spk[0:3] + spk[23:25]).hex() == '76a91488ac'
Total vB 131 106 + 25

the total size of one of these assembeled scripts is 131 vB (bytes). In regards to bitcoin addresses, the address is derived by chopping off the first 3 and last 2 bytes (op codes) from the scriptPubKey (spk). The script is recognized by it's length and the first 3 and last 2 bytes (OP_DUP, OP_HASH160, OP_PUSH, OP_EQUALVERIFY, OP_CHECKSIG).


Pay to Script Hash

So this two scripts concatination worked well for the first three years, but then, eventually more flexability was desired and BIP-16 was introduced. It was a simple enough concept, but if your looking at a scripting engine 100% defined simply by the stack and the two TXN script segmets, a completed script can not be created. You will need to invent a new op code OP_DESERIALIZE and then insert some op codes not originally provided in the script at all to exist purely in this scripting engine. The concept of OP_DESERIALIZE is to take the top data element redeemScript and reinterpret it as code instead of data.

P2SH size script
scriptSig ?? <sig> <<redeemScript>>
scriptPubKey 23 OP_HASH160 <rsHash> OP_EQUAL
assembled <scriptSig> OP_DUP <scriptPubKey> OP_VERIFY OP_DESERIALIZE
btc_address b58_encode(pfx + spk[3:23])
Test len(spk) == 23 and (spk[0:2] + spk[22:23]).hex() == 'a91487'
Total vB 96+ 73 + len(redeemScript) + 23

The total size on the blockchain for a P2SH spent output will be at least 97 bytes. The actual size will be dependant upon the size of redeemScript. The majority of non-segwit P2SH transactions are multisig related. At the time of BIP-16, multisig (P2MS) was already widely adopted, though it was mostly done in the scriptPubKey element. As before, this put the burdon on the sender to maintain an intricate scriptPubKey instead of a simple bitcoin address. P2SH allows complex scripts to be used while still providing basic pay to address type symantics. The address is derived like most pay-to-address outputs, though a differnet prefix (pfx) is used. The script is recognized by its length and by clipping the first and last two bytes.


Pay to Witness Public Key Hash

The last four script types were all introduced with Segrigated Witness (BIP-141). In order for Segwit to allow backward compatibility, the scriptSig and scriptPubKey elements are either empty or consist of nothing more than data elements (OP_PUSH). Since non-zero data will always pass validation, this makes all segwit TXNs default to valid if witness data is not included. Like P2SH a lot of the op-codes are implied and to make the point I'll artificially insert them here as we did with P2SH.

The P2WPKH is modeled after the P2PKH, but the scriptSig is moved to the witness program and most of the op-codes are implied. Many scripts are also prefixed with OP_0 to signify segwit enablement. The goal of segwit was to allow blocks to expand to something approaching 4MiB while not breaking older implementations. So you can still only have 1MiB of "legacy" block data, but you can have up to 3MiB of witness data... well kinda... the real WU math is a bit more complex.

P2WPKH size script
witness 107 <sig> <pubkey>
scriptPubKey 22 OP_0 <pkHash>
assembled <witness> OP_DUP OP_HASH160 <scriptPubKey> OP_SWAP OP_DROP OP_EQUALVERIFY OP_CHECKSIG
btc_address b32_encode(pfx + spk[2:22])
Test len(spk) == 22 and (spk[0:2]).hex() == '0014'
Total vB 48.75 22 + 107/4

For those keeping score, you'll notice that the witness program is 107, yet the same scriptSig elsewhere is 106. This is because the witness program has to push an element count (0x02) so it can be deserialized. I won't get into those specifics since I think we are already getting off in the weeds. You'll also notice with the WU math, we get to apply a 75% discount to the witness program. This gives our "virtual size" in the block at 48.75, making P2WPKH far and away the least expensive script type. The address is derived from the last 20 bytes of scriptPubKey but by identifying the scriptPubKey as a P2WPKH type, the address will use bech32 encoding instead of base58 encoding.


Pay to Witness Script Hash

P2WSH size script
witness ?? <sig> <<witnessScript>>
scriptPubKey 34 OP_0 <wsHash>
assembled <witness> OP_DUP OP_HASH160 <scriptPubKey> OP_SWAP OP_DROP OP_EQUALVERIFY OP_DESERIALIZE
btc_address b32_encode(pfx + spk[2:34])
Test len(spk) == 34 and (spk[0:2]).hex() == '0020'
Total vB 52+ 34 + (74 + len(witnessScript))/4

P2SH Encapsulating Pay to Witness Public Key Hash

P2SH-P2WPKH size script
witness 107 <sig> <pubkey>
scriptSig 23 <OP_0 <pkHash>>
scriptPubKey 23 OP_HASH160 <ssHash> OP_EQUAL
assembled <witness> OP_DUP OP_HASH160 <scriptSig> OP_DUP <scriptPubKey> OP_VERIFY OP_DESERIALIZE OP_SWAP OP_DROP OP_EQUALVERIFY OP_CHECKSIG
btc_address b58_encode(pfx + spk[2:22])
Test is_p2sh() and len(ss) == 23 and (ss[0:3]).hex() == '160014'
Total vB 72.75 23 + 23 + 107/4


No comments:

Post a Comment