Critical Point Episode 9: Shea and Anders talk blockchain (Part 2)
Transcript
Disclaimer: This podcast is intended solely for educational purposes and presents information of a general nature. It is not intended to guide or determine any specific individual situations. And persons should consult with qualified professionals before taking specific action. The views expressed in this podcast are those of the speakers and not those of Milliman.
Anders Larson: Hello and welcome to “Critical Point” brought to you by Milliman. I’m Anders Larson and I’ll be your host today. In this episode of “Critical Point,” we’re going to be talking about blockchain. This is the second episode we’re doing as part of a series on blockchain and its insurance applications. In that first episode, we really got into just the foundational basics behind blockchain, really not getting into any insurance applications. Today we’re going to stay more on the technical side and not necessarily get into any insurance applications quite yet; I know you’re anxious for us to get there, but we will get there next time. But we have a couple of foundational elements to kind of talk about that we think are critical for your understanding of blockchain. So I’m here again with Shea Parkes, a principal of Milliman. Shea, good to see you today.
Shea Parkes: Good to see you today too, Anders.
Anders Larson: All right. Let’s talk about some blockchain. Let’s talk about hashing and public key cryptography specifically. Again, these are two concepts that are, we think, foundational to blockchain and how it works and understanding them is critical if you’re going to start talking about using blockchain in the insurance industry, either current applications or potential future applications. Is that fair to say?
Shea Parkes: I think that’s absolutely fair to say, yes.
Anders Larson: All right. So let’s start with hashing. Shea, could you just give us a quick overview of what is hashing?
Shea Parkes: Sure. So when someone talks about hashing, they’re often talking about a hashing function, so a hashing function is a function that takes in an arbitrary amount of data and then produces sort of a number between one and a million, for example, or a limited amount of data. It’s a function you agree upon. Often they have names like an MD5 or a SHA-1 or something; those are known as hash functions. You feed them data such as you could make a text file that says, you know, “Hello, I am Shea,” and I could pass that text file through a hash function and I would generate a number, a number between one and a million, and you know, it has a lot of really neat characteristics to it, but I could also feed more than just a small text file. I could download a full audio song, feed that through the hash function. I’d still get a number between one and a million, and I could download a whole movie or a whole bunch of healthcare claims data or anything and pass it through a hash function and still get a number between one and a million.
Anders Larson: Now let’s make sure the distinction is clear. This is not a random-number generator; these are not random.
Shea Parkes: They are not random, so if I had a text file that said “Hello, I’m Shea,” and I hashed it with an MD5 algorithm and got the number 227 and I gave Anders over here or Anders re-created that same text file that it said “Hello, I’m Shea” and you ran it through the same hashing algorithm, you would get the same exact number out the other side, so it’s actually a deterministic function. There is nothing random about a hash function even though there are some cool things that make it look kind of random.
Anders Larson: Seemingly random.
Shea Parkes: Seemingly random, but the same input will produce the same output. Now the part that’s random or that looks kind of random is if you slightly change the input, so if you say instead of “Hello, I am Shea,” it’s “Hello, I am Shea, exclamation point,” the resulting output would be quite-- it’s still deterministic, but it would just be a new almost seemingly random choice between one and a million. It would have no correlation with the result of “Hello, I am Shea.”
Anders Larson: And I take that number that has been generated through this hash function. Can I take that number, run it backwards through the hash function?
Shea Parkes: You can’t run it backwards with a hash function. When you think about it and the fact that I said you could feed a-- entire gigabytes of data of a movie file through a hash function and still get a number between one and a million, you should be able to see that there is no way to get back that information, that you lost information because you squeezed it down to a number between one and a million, which by the way, I’m using one and a million as a simplification. Depending on the hash function, it may be a number between one and 10 million or one and 500 million or 512 million-- it-- but it’s always a discrete number of possible outcomes.
Anders Larson: Which leads to another point is that because it’s a discrete number of possible outcomes, it is possible for two sets of input data to produce the same output?
Shea Parkes: Absolutely. It’s known as a hash collision, so yes. Now it’s very unlikely those two input data pieces would look very similar because remember if you do something-- if you nudge the data just a little bit, a good hash algorithm-- or a definition of good-- a good often-used hash algorithm would produce very different results, but it is possible to have a hash collision. If you were hashing every word in every language, and you pick a hash algorithm that only produces answers between one and a 100,000, you’re going to get hash collisions. Even if it’s a good hash algorithm that spreads, the resulting hashes which-- often that number that comes out the other side is known as a hash, so if it spreads those numbers out fairly uniformly across the possible hash values, you’re still going to hit a collision if you feed in just a whole lot of different values on the way in. And since the data you’re feeding in could be of an arbitrary size, there is an infinite number of values you could feed into a hash function.
Anders Larson: Right. So we’ve gotten kind of deep into hashing specifically. Let’s pull back a little bit and how does this relate to blockchain or bitcoin or other cryptocurrencies?
Shea Parkes: Right, and we actually touched a little bit on this last time. It’s often used in the proof of work or also in what’s known as blockchain or bitcoin mining. Hashing is used in-- it’s not the only part that’s used, but it’s used to sort of solve this puzzle, so a block in a blockchain is often a group of transactions of transfers of cryptocurrency or other things. It’s not the only thing it could be, but it’s often a group of transactions and whenever everyone’s trying to mine this next block of transactions they’ve all agreed on a hash algorithm, SHA-256 or something, so they’re agreed on the algorithm. And so they get all these transactions together and they compute the hash on these transactions; they get a number between one and a million. Now the goal is to figure out what noise or nonce to add to the block to get a hash that’s maybe below 10,000 or below some number on the hash output so to set a threshold there and so it’s not really possible to back into it, so everyone just starts random guessing and throwing random noise into the block to try to find that noise that gets the resulting hash below some threshold. So that’s a classic cryptocurrency sort of blockchain use of a hash. It’s not the only use. There’s other tricks that they use in there to hide some data off the blockchain, etcetera, so—
Anders Larson: There’s a couple of other thoughts I had when we-- started talking about what this hash does. Is it really a method that could be used for encryption?
Shea Parkes: So it’s not really usable for encryption because you can’t get back, so once I have the hash of something, there’s no way to recover what I had but I can have some confidence that I’ve been given the same data again-- and that’s maybe one of the use-- first use cases of hashes was transferring data across the internet is-- it’s a modern miracle. Behind the scenes, there is an amazing amount of stuff going on and all sorts of failures happening, but most of the time everything actually succeeds from a user perspective, but a common use of hashes is to feed them as-- sort of like an arbitrary check sum. If I’m going to send you a big data file, I could say, “Well, hey, here’s the MD5 hash of this big data file” and whenever you get the big data file, you take that big data file, you pass it through the MD5 hashing algorithm, you get a number between one and a million, and you check against that one that I gave you. And you’re like, “Okay. Well, I’ve got, I’ve got more confidence in this modern miracle, the internet, having gotten this data across to the other side because it’s there.”
Anders Larson: So on a similar note, compression was not a thought I had because you’re taking a lot of data and converting it to something of a fixed length that’s small, but again the irreversibility is kind of a killer for trying to use this for compression.
Shea Parkes: Correct. It basically compresses so well there’s no way to get back, which means it doesn’t-- it’s not really used for compression. The thing about it-- you’re taking in as much data as on a Blu-ray disc or something; you can still convert that down and through a hash function and you still get a number between one and a million, which is not enough to re-create whatever Hollywood movie you put in the other side.
Anders Larson: And the fact that there are potential collisions means that there’s no way you could even know because you couldn’t map to multiple inputs.
Shea Parkes: Yeah, absolutely. Hashing-- once you know more about it, I’m always surprised how many places hashing pops up, a lot in data analytics, machine learning, programming. It underlies a lot of what happens on your computer in your day-to-day, but a lot of the times it’s back there doing its job flawlessly and you’ll never even know about it.
Anders Larson: Well, I think public key cryptography is something that probably-- as we talked about before in the podcast-- is something that’s also going on all the time without you necessarily realizing, but I think for those of you who are interested in potential applications in insurance, it’s important to know what’s actually going to happen when we talk about cryptography.
Shea Parkes: Right, yes. So public key cryptography is another important concept that-- the other thing-- there is one thing public key cryptography shares with hashing is it’s a data transformation, so in hashing you put data in, you get a hash result out. Public key cryptography, you also put data in and you get encrypted data out. So they’re both sort of a data transformation algorithm but yeah, public key cryptography is different, but it is also used all the time in your day-to-day life and it’s also really foundational to a lot of blockchains, especially the cryptocurrency block chains out there so—
Anders Larson: So let’s jump right into it. What do we mean when we say “public key cryptography?”
Shea Parkes: All right. So I’m going to do a little bit of a longer explanation on this one, but at the same time we’re still not going to cover all there is to know about public key cryptography. If this whets your appetite, I strongly encourage you-- however you like-- to research books, other podcasts. There’ll be a lot of great information out there about this, but let’s get in and give it what we can here in a few minutes. So to get started with public key cryptography, you need to generate some encryption keys. This happens, by the way, all the time, but sometimes you want to do it mindfully and to do it mindfully you need a program and you need some entropy and by “entropy,” we mean some actual randomness, maybe you wiggling your mouse or if you’re extreme measuring radiation in the air or something, but let’s go ahead. You have a program that’s meant to capture and work with some entropy and then it generates a pair of encryption keys, okay, so a pair so two encryption keys. Now what you do with these two encryption keys is very important. One of them you call it your private key and you protect that as well as you can. You lock that down. You put it in virtual or real safes. You hide it behind additional passwords. You do whatever is needed to keep that private key safe.
Now conversely, that other key you made you call it your public key and in a lot of scenarios you do a really weird thing where you go scream it from the mountaintops and you say, “Here is this public key. Here’s this public key. Everyone, here’s this public key.” Now in some cases, you don’t scream it from the mountaintop, but a lot of times you do, and everyone else gets the public key and hopefully only you have that private key and that it was locked under there. Now together these two keys do some really cool tricks, so you can take any data like let’s just say I wanted to encode “Roses are purple” and so I write a little text file and I say, “Roses are purple.” I take out my private key from wherever I had it secured. I encrypt “Roses are purple,” and I don’t get a number between one and a million; I get back some data that’s probably about the same size because there wasn’t any real compression, but it was scrambled and they can’t understand it anymore, okay. And I can take that scrambled message and I can throw it out off the mountaintop and say, “Hey, world, here’s this scrambled message” and anyone that bothered to listen to me the first time or listened later and grabbed my public key can decrypt that message and they’ll see a message that says “Roses are purple.” They’ll know that whoever wrote “Roses are purple” had the private key because the only thing that public key could decrypt is the private key that it was made with. Now this can work the other way. Anyone out there could write a message that says, “Hey, dummy, roses are red” and they say “Hey, dummy, roses are red.” Maybe they do this securely on their own machine, they take that public key, they encrypt their message that says, “Hey, dummy, roses are red.” Now they don’t take “Hey”-- that encrypted message and throw it off the mountaintop. Maybe they know who was associated with that private key, but they could throw it off the mountaintop if they wanted; they could throw it off the mountaintop and say, “I don’t even know who has the private key” and so throw it off the mountaintop and say, “Hey, anyone that has this-- was associated with this public key, I’ve got a message for you.” And I’m out there scanning and I say, “Oh, someone said they wanted to send a message back to that public key. Let me grab that message.” I can decrypt it and I can find out that in actuality, roses are red and that someone wanted to tell me that roses are red and I’m the only one that would be able to decrypt that message because I have the private key.
Anders Larson: Okay. So I’m going to try to draw a parallel here, Shea, and you let me know if it sounds silly. Your example of the-- of you sending out this “Roses are purple” to-- from your private key out to the public key-- would a parallel be something like sending out a tweet where anybody who knows my handle on Twitter can see that I had tweeted this. They could then direct message or DM me back and that DM, which is-- that’s only going to go to me; others cannot see that. That’s coming from somebody who knows my handle. They can send it to me but only me who has access to my account can read that message that comes back to me.
Shea Parkes: Yeah. No. That’s a good parallel in how it functions sort of functionally for you on Twitter. A difference when you start bringing in public-, private-key cryptography is that with-- if you’re really securing your private key, you don’t have to trust the people at Twitter to not have read that direct message back, but yeah, it functionally works the same way that someone could send just a message back to you and they-- but the neat thing is you don’t have to trust Twitter, yes.
Anders Larson: So we’ve talked about how this sort of can be used for communication in some form but how is it used for -- in cryptocurrencies in blockchain and checkpoint.
Shea Parkes: Sure. So you’ve probably heard of the idea of a bitcoin or a cryptocurrency wallet so in cryptocurrencies-- I’m probably going to say “bitcoin” on accident here. This is-- applies to really any cryptocurrencies or any ledger-- public sort of ledger here that if I want a bitcoin, well, then I need to convince somebody that has some bitcoins to send it-- those bitcoins to a public key of my choosing. So I-- if Anders over there imaginatively has some bitcoins, I need to somehow convince him to send some of those bitcoins to a public key that hopefully I have the private key for and if so, that he would send them in a transaction that would get wrapped up into a block on the blockchain and then everyone in the world would believe that the public, private key pair that I have hopefully control of the private key of, is now the owner-- I’m air quoting there because it’s-- we have nothing real, right-- an owner of three bitcoins that somehow got onto this blockchain that show that they went from a wallet that had bitcoins to now this brand-new wallet that is publicly associated with a public key. Now as long as I have that private key that goes with that public key in a later block or at any later time I could write out a little message that says, “Hey, send one of my bitcoins to this other public key number 2424 blah blah blah” and then I encrypt it with my private key and I toss it up into the mining field that says, “Hey, I want to add this transaction to the next block in the ledger.” They can all-- since they can all see “Well, hey”— they’re claiming to transfer out of this public key I can decrypt their message and see if they did indeed sign it with their private key and if so, then we’re going to say that, “Well, first of all, did they sign it with the private key and secondly, according to the ledger, did they have a bitcoin to give?” and if so then yeah, okay, we’ll let on the block and we’ll transfer that bitcoin somewhere else.
Anders Larson: And when you’ve made this-- you’ve decided to give a bitcoin to some other public key once you make that-- once you decide to do that, does that person on the other end have to actively receive it or it’s gone?
Shea Parkes: The transfer is immediate; it’s part of the ledger so there isn’t—
Anders Larson: It’s not like writing a check and nobody cashes it. It’s--
Shea Parkes: Well, I mean it can be. I mean-- well, no, it’s not like that because there’s-- no money comes out of the—
Anders Larson: You have given them no money and if-- even if no one finds that key, it’s gone.
Shea Parkes: Right. I mean technically I could make up a public key that there is no private key to and I could just be tossing bitcoins to the wind, yeah.
Anders Larson: So this does present some interesting strengths and weaknesses I guess of blockchain and potential applications that I think are critical when we move forward is-- which is that the private key here is everything in terms of access to your data, your coins, or being able to access the information related to that key.
Shea Parkes: Absolutely. Yeah. If there are coins in the public key that-- anyone with a private key could transfer them out so it’s-- when you talk about someone stealing bitcoins, what likely happened is somehow or other they got a copy of the private key associated with a lot of bitcoins. Now as soon as they got that copy, they probably immediately made a big transfer of all the bitcoins associated with that private key to other public keys for which they had control of the private keys, so as soon as someone steals a private key, all those bitcoins are going away pretty much and it’s a race to get that transaction on the ledger. If somebody knew your private key was compromised or you were even concerned that it could be compromised in the future, there would be a reason to make a new pair of keys and transfer all of your bitcoins over to the new public key for which you’re not concerned about the security of the new private key.
Anders Larson: In some ways it provides security in that the only way to access the information is through the private key but once-- if that is compromised there is no-- in a classical blockchain, there’s no recourse.
Shea Parkes: In a classical blockchain, there’s no recourse. Now this also does give a bit of explanation to the story about-- stories about people digging in landfills for bitcoins.
Anders: Right.
Shea Parkes: You hear somebody’ll go, “Well, I threw away $10,000 of bitcoins, so I was digging through trash at landfills trying to find the bitcoins.” Okay. Well, there are no bitcoins in the trash. What they did was they probably threw away a hard drive on which the private key was stored for a wallet that so many years ago they bought a-- some bitcoins and had them transferred to the corresponding public key, it was probably worth 10 cents at the time and they didn’t care, got a new computer, didn’t bother transferring the data over, threw it away and then they realized, “Oh, goodness. Some inflation happened on there. If I worked with someone else that’s willing to trade me some money for those bitcoins, I could have gotten quite a bit of money,” but if no one ever finds that private key in the trash heap out there, those bitcoins are just permanently stuck in that wallet or-- other cryptocurrencies work mostly the same way but yes, if no one can ever find the private key this-- those coins are just stuck and lost.
Anders Larson: Yeah. That’s a story that a lot of us have probably heard some version of and it’s interesting that it’s not the coins per se that they’re going after; it’s the key.
Shea Parkes: Yeah. It’s the private key that accesses the coins so—
Anders Larson: So I think with that we’ve had a lot of interesting discussion here on hashing, public-key cryptography, and kind of leading into some of the challenges that this puts forth when we talk about insurance applications so we can let you think about that and we will be back soon with another episode to talk a little bit more about some of these applications in the insurance world. So Shea--
Shea Parkes: Thank you very much, Anders.
Anders Larson: Thank you very much. You've been listening to Critical Point presented by Milliman. To listen to other episodes of our podcast, you can visit us at milliman.com or find us on iTunes, Google Play, Spotify, and Stitcher. See you next time.
Explore more tags from this article
About the Author(s)
Critical Point Episode 9: Shea and Anders talk blockchain (Part 2)
The second in a multi-part series, Shea Parkes and Anders Larson discuss hashing and public key cryptography and how the two concepts are critical to understanding the pros and cons of blockchain for insurance applications.