### 17.3.06

## Secure hashing

In mathematical terms, a secure hash function is a function f from the natural numbers {1,2,...} to some finite subset of the natural numbers, which may as well be {1,2,...,n} (and usually is in practice). A secure hash is characterised by the fact that it's very difficult to find f^(-1), the inverse function. In fact in practise the only way to find f^(-1){k}, the set of numbers which map to some number k, may be to go through all the natural numbers, working out the hash of each one. Of particular importance is that it should be computationally unfeasible to generate a 'collision', or two values whose hash is the same.

Hashes have many cryptographic applications. The canonical example is the method for tossing a coin or making a similar decision over a communication line. Suppose that two people, Alice and Bob, want to play chess by email. Chess is a good game for this, because no information is kept secret from either player. The only problem Alice and Bob have is in deciding who will start. Usually chess players do this by one of them hiding a pawn of either colour in their closed hands, and the other choosing a hand. They can't do this by email, unless the one that guesses trusts the other to tell the truth. Similarly, if one of them tosses a coin, the other will have to trust them to accurately report the result.

A good way to do it is to proceed as follows; Alice chooses either Black or White. She writes this in a text file:

and adds a few junk characters to the file:

Then she turns this string of symbols into a number. She can do this in lots of standard ways computers use to represent text as numbers. In ISO-8859-15, a very common computer alphabet, the text file translates to the number

which is expressed in hex or hexadecimal notation, using 16 digits, 0-9 and a-f, instead of the usual ten.

She then finds the hash of this number. If she does it using a very standard and simple hash function, MD5, she'll end up with a number between 0 and 2 to the power of 128. In this case she gets

again in hex. She sends this number to Bob. Then Bob writes back, guessing whether Alice's chosen colour is Black or White. Let's say he guesses White.

Alice writes back to him, letting him know that he has got it wrong. She includes the original text that she took the hash of, "Blackqwertyuiopxxxx314159hellohello". If Bob doesn't trust her, he can check that the hash of this message is the same one that he received from Alice.

The security of this procedure rests on two properties of the hash function. Firstly, it's effectively impossible for Bob to recover Alice's text file from the hash, to find out whether it includes the word 'Black' or 'White'. The only way he could try and do this is to work out the hashes of as many text files as possible which include these words and compare them to Alice's hash. But Alice can easily thwart this strategy by adding more junk characters to the file, so that it becomes impossible for Bob to check enough possibilities. Bob's strategy also doesn't allow for false positives, when he might find a text string different from Alice's including the words Black or White, which purely by coincidence has the same hash as Alice's.

Secondly, it's also impossible in practical terms for Alice find a collision; two strings, one including 'Black', and one including 'White', which have the same hash. If she could do this, she could send that hash to Bob, and then send him the text string containing the opposite colour to the one he picks. To do this, she would have to check an enormous number of strings containing both words, until she found one of each with the same hash. Bob can try and make this difficult for her by insisting that she include other information specified by him, such as the date, in the original message. This limits Alice's search for a collision, and stops her from reusing the results of her search in subsequent games.

Hashes have many cryptographic applications. The canonical example is the method for tossing a coin or making a similar decision over a communication line. Suppose that two people, Alice and Bob, want to play chess by email. Chess is a good game for this, because no information is kept secret from either player. The only problem Alice and Bob have is in deciding who will start. Usually chess players do this by one of them hiding a pawn of either colour in their closed hands, and the other choosing a hand. They can't do this by email, unless the one that guesses trusts the other to tell the truth. Similarly, if one of them tosses a coin, the other will have to trust them to accurately report the result.

A good way to do it is to proceed as follows; Alice chooses either Black or White. She writes this in a text file:

`Black`

and adds a few junk characters to the file:

`Blackqwertyuiopxxxx314159hellohello`

.Then she turns this string of symbols into a number. She can do this in lots of standard ways computers use to represent text as numbers. In ISO-8859-15, a very common computer alphabet, the text file translates to the number

`426c61636b71776572747975696f707878787833313431353968656c6c6f68656c6c6f`

which is expressed in hex or hexadecimal notation, using 16 digits, 0-9 and a-f, instead of the usual ten.

She then finds the hash of this number. If she does it using a very standard and simple hash function, MD5, she'll end up with a number between 0 and 2 to the power of 128. In this case she gets

`0e42633f6c2f4003652482b2c907a157`

again in hex. She sends this number to Bob. Then Bob writes back, guessing whether Alice's chosen colour is Black or White. Let's say he guesses White.

Alice writes back to him, letting him know that he has got it wrong. She includes the original text that she took the hash of, "Blackqwertyuiopxxxx314159hellohello". If Bob doesn't trust her, he can check that the hash of this message is the same one that he received from Alice.

The security of this procedure rests on two properties of the hash function. Firstly, it's effectively impossible for Bob to recover Alice's text file from the hash, to find out whether it includes the word 'Black' or 'White'. The only way he could try and do this is to work out the hashes of as many text files as possible which include these words and compare them to Alice's hash. But Alice can easily thwart this strategy by adding more junk characters to the file, so that it becomes impossible for Bob to check enough possibilities. Bob's strategy also doesn't allow for false positives, when he might find a text string different from Alice's including the words Black or White, which purely by coincidence has the same hash as Alice's.

Secondly, it's also impossible in practical terms for Alice find a collision; two strings, one including 'Black', and one including 'White', which have the same hash. If she could do this, she could send that hash to Bob, and then send him the text string containing the opposite colour to the one he picks. To do this, she would have to check an enormous number of strings containing both words, until she found one of each with the same hash. Bob can try and make this difficult for her by insisting that she include other information specified by him, such as the date, in the original message. This limits Alice's search for a collision, and stops her from reusing the results of her search in subsequent games.

Comments:

<< Home

Terrific explanation and application! Turns out sets of collisions for MD5 can now be found on a low-powered laptop in a matter of hours, but MD5 was popular for, like,

I've always been found of the 'broken plate' explanation for hashes too, though it's a little tougher to think of a neat story as to why someone would want to break a plate, and why it's interesting that it would be difficult to put the plate back together correctly.

*ever*.I've always been found of the 'broken plate' explanation for hashes too, though it's a little tougher to think of a neat story as to why someone would want to break a plate, and why it's interesting that it would be difficult to put the plate back together correctly.

thx.

i know md5 (and recently sha-1!) are now 'considered broken', i think both first by the same chinese researcher. but i was reading about md5, and it's easy to find implementations, so i thought i'd use it.

broken plates can be useful for authentication: i give you half a broken plate, and keep the other half. it's easy to check that your half plate fits mine, but impossible for anyone to break another plate in quite the same way (a collision). that's a slight variation of the usual analogy though i think. this was apparently done by banks in societies where people don't use signatures much - by tearing a low-denomination banknote in half, irregularly.

someone i know trolls anonymously on slashdot, but includes md5's of phrases including his nickname, which he later posts on his website. i think this is another great application.

Post a Comment
i know md5 (and recently sha-1!) are now 'considered broken', i think both first by the same chinese researcher. but i was reading about md5, and it's easy to find implementations, so i thought i'd use it.

broken plates can be useful for authentication: i give you half a broken plate, and keep the other half. it's easy to check that your half plate fits mine, but impossible for anyone to break another plate in quite the same way (a collision). that's a slight variation of the usual analogy though i think. this was apparently done by banks in societies where people don't use signatures much - by tearing a low-denomination banknote in half, irregularly.

someone i know trolls anonymously on slashdot, but includes md5's of phrases including his nickname, which he later posts on his website. i think this is another great application.

<< Home