In this blog post, we are going to explore a core concept of the “Git” Technology :
the SHA-1 Hash function.
Briefly, any file in a computer can be thought of as a series of bytes, each of which is 8 bits. If you put these bytes from left to right, all files can be thought of as very large numbers represented in binary (base-2) format. Cryptographers have come up with a very interesting function called SHA-1 which has the following curious property: any binarynumber, up to 264 bits, can be rapidly mapped to a 160 bit (20 byte) number that we can visualize as a 40 character long number in hexadecimal (base-16) format. Here is an example using node’s crypto module:
The point is that, even binary numbers which are very close, map to completely different 20 byte SHA-1 values, which means SHA-1(x) is very different from most “normal” functions like cos(x)
Because a hash can be assumed to map 1-to-1 to a file, rather than conveying the full file’s contents to distinguish it, you can just use the hash. Indeed, you can just use a hash to uniquely identify any string of bits less than 264 bits in length.