In computer science, a digital fingerprinting algorithm is a procedure that maps an arbitrarily large bitstream (such as a computer file) to a much shorter bit string, its digital fingerprint, that uniquely identifies the original data for all practical purposes.
To serve its intended purposes, a fingerprinting algorithm must be able to capture the identity of a bitstream with virtual certainty. In other words, the probability of a collision — two bitstreams yielding the same fingerprint — must be so vanishingly negligible that it can be ignored for most practical purposes.
Cryptographic hash functions such as the Secure Hash Algorithm (SHA) set of algorithms generally serve as good fingerprint functions.
You are dealing with digital fingerprints every time you invoke md5sum or sha1sum to verify the integrity of a file:
$ echo "Hello, world" > hello.txt $ sha1sum hello.txt SHA1(hello.txt)= 7b4758d4baa20873585b9597c7cb9ace2d690ab8
Content-addressable storage relies on digital fingerprints serving as unique content identifiers. Bitcache by default makes use of the widely-deployed SHA-1 algorithm for digital fingerprints, with future plans to also support the SHA-2 family of algorithms.