Solution Task 1.1. Cryptographic Hash Functions: Basics

Nowadays, the networks have gone global and information has taken the digital form of bits and bytes. Critical information now gets stored, processed and transmitted in digital form on computer systems and open communication channels.

Since information plays such a vital role, adversaries are targeting the computer systems and open communication channels to either steal the sensitive information or to disrupt the critical information system.

2. CHAPTER 1 1.1. CRYPTOGRAPHIC HASH FUNCTIONS The first cryptographic primitive that we need to understand is a cryptographic hash function. A hash function is a mathematical function with the following three properties:. Its input can be any string of any size.

Modern cryptography provides a robust set of techniques to ensure that the malevolent intentions of the adversary are thwarted while ensuring the legitimate users get access to information. Here in this chapter, we will discuss the benefits that we draw from cryptography, its limitations, as well as the future of cryptography.

Cryptography – Benefits

Cryptography is an essential information security tool. It provides the four most basic services of information security −

Confidentiality − Encryption technique can guard the information and communication from unauthorized revelation and access of information.
Authentication − The cryptographic techniques such as MAC and digital signatures can protect information against spoofing and forgeries.
Data Integrity − The cryptographic hash functions are playing vital role in assuring the users about the data integrity.
Non-repudiation − The digital signature provides the non-repudiation service to guard against the dispute that may arise due to denial of passing message by the sender.

All these fundamental services offered by cryptography has enabled the conduct of business over the networks using the computer systems in extremely efficient and effective manner.

Cryptography – Drawbacks

Apart from the four fundamental elements of information security, there are other issues that affect the effective use of information −

A strongly encrypted, authentic, and digitally signed information can be difficult to access even for a legitimate user at a crucial time of decision-making. The network or the computer system can be attacked and rendered non-functional by an intruder.
High availability, one of the fundamental aspects of information security, cannot be ensured through the use of cryptography. Other methods are needed to guard against the threats such as denial of service or complete breakdown of information system.
Another fundamental need of information security of selective access control also cannot be realized through the use of cryptography. Administrative controls and procedures are required to be exercised for the same.
Cryptography does not guard against the vulnerabilities and threats that emerge from the poor design of systems, protocols, and procedures. These need to be fixed through proper design and setting up of a defensive infrastructure.
Cryptography comes at cost. The cost is in terms of time and money −
- Addition of cryptographic techniques in the information processing leads to delay.
- The use of public key cryptography requires setting up and maintenance of public key infrastructure requiring the handsome financial budget.
The security of cryptographic technique is based on the computational difficulty of mathematical problems. Any breakthrough in solving such mathematical problems or increasing the computing power can render a cryptographic technique vulnerable.

Future of Cryptography

Elliptic Curve Cryptography (ECC) has already been invented but its advantages and disadvantages are not yet fully understood. ECC allows to perform encryption and decryption in a drastically lesser time, thus allowing a higher amount of data to be passed with equal security. However, as other methods of encryption, ECC must also be tested and proven secure before it is accepted for governmental, commercial, and private use.

Quantum computation is the new phenomenon. While modern computers store data using a binary format called a 'bit' in which a '1' or a '0' can be stored; a quantum computer stores data using a quantum superposition of multiple states. These multiple valued states are stored in 'quantum bits' or 'qubits'. This allows the computation of numbers to be several orders of magnitude faster than traditional transistor processors.

To comprehend the power of quantum computer, consider RSA-640, a number with 193 digits, which can be factored by eighty 2.2GHz computers over the span of 5 months, one quantum computer would factor in less than 17 seconds. Numbers that would typically take billions of years to compute could only take a matter of hours or even minutes with a fully developed quantum computer.

In view of these facts, modern cryptography will have to look for computationally harder problems or devise completely new techniques of archiving the goals presently served by modern cryptography.

A cryptographic hash function is a hash function that is suitable for use in cryptography. It is a mathematical algorithm that maps data of arbitrary size (often called the 'message') to a bit string of a fixed size (the 'hash value', 'hash', or 'message digest') and is a one-way function, that is, a function which is practically infeasible to invert.^[1] Ideally, the only way to find a message that produces a given hash is to attempt a brute-force search of possible inputs to see if they produce a match, or use a rainbow table of matched hashes. Cryptographic hash functions are a basic tool of modern cryptography.^[2]

The ideal cryptographic hash function has five main properties:

it is deterministic, meaning that the same message always results in the same hash
it is quick to compute the hash value for any given message
it is practically infeasible to generate a message that yields a given hash value
a small change to a message should change the hash value so extensively that the new hash value appears uncorrelated with the old hash value
it is infeasible to find two different messages with the same hash value

Cryptographic hash functions have many information-security applications, notably in digital signatures, message authentication codes (MACs), and other forms of authentication. They can also be used as ordinary hash functions, to index data in hash tables, for fingerprinting, to detect duplicate data or uniquely identify files, and as checksums to detect accidental data corruption. Indeed, in information-security contexts, cryptographic hash values are sometimes called (digital) fingerprints, checksums, or just hash values, even though all these terms stand for more general functions with rather different properties and purposes.

Secure Hash Algorithms
Concepts
hash functions ·SHA·DSA
Main standards
SHA-0·SHA-1·SHA-2·SHA-3

A cryptographic hash function (specifically SHA-1) at work. A small change in the input (in the word 'over') drastically changes the output (digest). This is the so-called avalanche effect.

Properties

Most cryptographic hash functions are designed to take a string of any length as input and produce a fixed-length hash value.

A cryptographic hash function must be able to withstand all known types of cryptanalytic attack. In theoretical cryptography, the security level of a cryptographic hash function has been defined using the following properties:

Pre-image resistance
Given a hash value h it should be difficult to find any message m such that h = hash(m). This concept is related to that of a one-way function. Functions that lack this property are vulnerable to preimage attacks.
Second pre-image resistance
Given an input m₁, it should be difficult to find a different input m₂ such that hash(m₁) = hash(m₂). Functions that lack this property are vulnerable to second-preimage attacks.
Collision resistance
It should be difficult to find two different messages m₁ and m₂ such that hash(m₁) = hash(m₂). Such a pair is called a cryptographic hash collision. This property is sometimes referred to as strong collision resistance. It requires a hash value at least twice as long as that required for pre-image resistance; otherwise collisions may be found by a birthday attack.^[3]

Collision resistance implies second pre-image resistance, but does not imply pre-image resistance.^[4] The weaker assumption is always preferred in theoretical cryptography, but in practice, a hash-function which is only second pre-image resistant is considered insecure and is therefore not recommended for real applications.

Informally, these properties mean that a malicious adversary cannot replace or modify the input data without changing its digest. Thus, if two strings have the same digest, one can be very confident that they are identical. Second pre-image resistance prevents an attacker from crafting a document with the same hash as a document the attacker cannot control. Collision resistance prevents an attacker from creating two distinct documents with the same hash.

A function meeting these criteria may still have undesirable properties. Currently popular cryptographic hash functions are vulnerable to length-extension attacks: given hash(m) and len(m) but not m, by choosing a suitable m' an attacker can calculate hash(m || m') where || denotes concatenation.^[5] This property can be used to break naive authentication schemes based on hash functions. The HMAC construction works around these problems.

In practice, collision resistance is insufficient for many practical uses.In addition to collision resistance, it should be impossible for an adversary to find two messages with substantially similar digests; or to infer any useful information about the data, given only its digest. In particular, a hash function should behave as much as possible like a random function (often called a random oracle in proofs of security) while still being deterministic and efficiently computable. This rules out functions like the SWIFFT function, which can be rigorously proven to be collision resistant assuming that certain problems on ideal lattices are computationally difficult, but as a linear function, does not satisfy these additional properties.^[6]

Checksum algorithms, such as CRC32 and other cyclic redundancy checks, are designed to meet much weaker requirements, and are generally unsuitable as cryptographic hash functions. For example, a CRC was used for message integrity in the WEP encryption standard, but an attack was readily discovered which exploited the linearity of the checksum.

Degree of difficulty

In cryptographic practice, 'difficult' generally means 'almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important'. The meaning of the term is therefore somewhat dependent on the application since the effort that a malicious agent may put into the task is usually proportional to his expected gain. However, since the needed effort usually multiplies with the digest length, even a thousand-fold advantage in processing power can be neutralized by adding a few dozen bits to the latter.

For messages selected from a limited set of messages, for example passwords or other short messages, it can be feasible to invert a hash by trying all possible messages in the set. Because cryptographic hash functions are typically designed to be computed quickly, special key derivation functions that require greater computing resources have been developed that make such brute force attacks more difficult.

In some theoretical analyses 'difficult' has a specific mathematical meaning, such as 'not solvable in asymptoticpolynomial time'. Such interpretations of difficulty are important in the study of provably secure cryptographic hash functions but do not usually have a strong connection to practical security. For example, an exponential time algorithm can sometimes still be fast enough to make a feasible attack. Conversely, a polynomial time algorithm (e.g., one that requires n²⁰ steps for n-digit keys) may be too slow for any practical use.

Illustration

An illustration of the potential use of a cryptographic hash is as follows: Alice poses a tough math problem to Bob and claims she has solved it. Bob would like to try it himself, but would yet like to be sure that Alice is not bluffing. Therefore, Alice writes down her solution, computes its hash and tells Bob the hash value (whilst keeping the solution secret). Then, when Bob comes up with the solution himself a few days later, Alice can prove that she had the solution earlier by revealing it and having Bob hash it and check that it matches the hash value given to him before. (This is an example of a simple commitment scheme; in actual practice, Alice and Bob will often be computer programs, and the secret would be something less easily spoofed than a claimed puzzle solution).

Applications

Verifying the integrity of messages and files

An important application of secure hashes is verification of message integrity. Comparing message digests (hash digests over the message) calculated before, and after, transmission can determine whether any changes have been made to the message or file.

MD5, SHA1, or SHA2 hash digests are sometimes published on websites or forums to allow verification of integrity for downloaded files,^[7] including files retrieved using file sharing such as mirroring. This practice establishes a chain of trust so long as the hashes are posted on a site authenticated by HTTPS. Using a cryptographic hash and a chain of trust prevents malicious changes to the file to go undetected. Other error detecting codes such as cyclic redundancy checks only prevent against non-malicious alterations of the file.

Signature generation and verification

Almost all digital signature schemes require a cryptographic hash to be calculated over the message. This allows the signature calculation to be performed on the relatively small, statically sized hash digest. The message is considered authentic if the signature verification succeeds given the signature and recalculated hash digest over the message. So the message integrity property of the cryptographic hash is used to create secure and efficient digital signature schemes.

Password verification

Password verification commonly relies on cryptographic hashes. Storing all user passwords as cleartext can result in a massive security breach if the password file is compromised. One way to reduce this danger is to only store the hash digest of each password. To authenticate a user, the password presented by the user is hashed and compared with the stored hash. A password reset method is required when password hashing is performed; original passwords cannot be recalculated from the stored hash value.

Standard cryptographic hash functions are designed to be computed quickly, and, as a result, it is possible to try guessed passwords at high rates. Common graphics processing units can try billions of possible passwords each second. Password hash functions that perform Key stretching - such as PBKDF2, scrypt or Argon2 - commonly use repeated invocations of a cryptographic hash to increase the time (and in some cases computer memory) required to perform brute force attacks on stored password hash digests. A password hash requires the use of a large random, non-secret salt value which can be stored with the password hash. The salt randomizes the output of the password hash, making it impossible for an adversary to store tables of passwords and precomputed hash values to which the password hash digest can be compared.

The output of a password hash function can also be used as a cryptographic key. Password hashes are therefore also known as Password Based Key Derivation Functions (PBKDFs).

Proof-of-work

A proof-of-work system (or protocol, or function) is an economic measure to deter denial-of-service attacks and other service abuses such as spam on a network by requiring some work from the service requester, usually meaning processing time by a computer. A key feature of these schemes is their asymmetry: the work must be moderately hard (but feasible) on the requester side but easy to check for the service provider. One popular system – used in Bitcoin mining and Hashcash – uses partial hash inversions to prove that work was done, to unlock a mining reward in Bitcoin and as a good-will token to send an e-mail in Hashcash. The sender is required to find a message whose hash value begins with a number of zero bits. The average work that sender needs to perform in order to find a valid message is exponential in the number of zero bits required in the hash value, while the recipient can verify the validity of the message by executing a single hash function. For instance, in Hashcash, a sender is asked to generate a header whose 160 bit SHA-1 hash value has the first 20 bits as zeros. The sender will on average have to try 2¹⁹ times to find a valid header.

File or data identifier

A message digest can also serve as a means of reliably identifying a file; several source code management systems, including Git, Mercurial and Monotone, use the sha1sum of various types of content (file content, directory trees, ancestry information, etc.) to uniquely identify them. Hashes are used to identify files on peer-to-peerfilesharing networks. For example, in an ed2k link, an MD4-variant hash is combined with the file size, providing sufficient information for locating file sources, downloading the file and verifying its contents. Magnet links are another example. Such file hashes are often the top hash of a hash list or a hash tree which allows for additional benefits.

One of the main applications of a hash function is to allow the fast look-up of a data in a hash table. Being hash functions of a particular kind, cryptographic hash functions lend themselves well to this application too.

However, compared with standard hash functions, cryptographic hash functions tend to be much more expensive computationally. For this reason, they tend to be used in contexts where it is necessary for users to protect themselves against the possibility of forgery (the creation of data with the same digest as the expected data) by potentially malicious participants.

Hash functions based on block ciphers

There are several methods to use a block cipher to build a cryptographic hash function, specifically a one-way compression function.

The methods resemble the block cipher modes of operation usually used for encryption. Many well-known hash functions, including MD4, MD5, SHA-1 and SHA-2 are built from block-cipher-like components designed for the purpose, with feedback to ensure that the resulting function is not invertible. SHA-3 finalists included functions with block-cipher-like components (e.g., Skein, BLAKE) though the function finally selected, Keccak, was built on a cryptographic sponge instead.

A standard block cipher such as AES can be used in place of these custom block ciphers; that might be useful when an embedded system needs to implement both encryption and hashing with minimal code size or hardware area. However, that approach can have costs in efficiency and security. The ciphers in hash functions are built for hashing: they use large keys and blocks, can efficiently change keys every block, and have been designed and vetted for resistance to related-key attacks. General-purpose ciphers tend to have different design goals. In particular, AES has key and block sizes that make it nontrivial to use to generate long hash values; AES encryption becomes less efficient when the key changes each block; and related-key attacks make it potentially less secure for use in a hash function than for encryption.

Hash function design

Merkle–Damgård construction

The Merkle–Damgård hash construction.

A hash function must be able to process an arbitrary-length message into a fixed-length output. This can be achieved by breaking the input up into a series of equal-sized blocks, and operating on them in sequence using a one-way compression function. The compression function can either be specially designed for hashing or be built from a block cipher. A hash function built with the Merkle–Damgård construction is as resistant to collisions as is its compression function; any collision for the full hash function can be traced back to a collision in the compression function.

The last block processed should also be unambiguously length padded; this is crucial to the security of this construction. This construction is called the Merkle–Damgård construction. Most common classical hash functions, including SHA-1 and MD5, take this form.

Wide pipe vs narrow pipe

A straightforward application of the Merkle–Damgård construction, where the size of hash output is equal to the internal state size (between each compression step), results in a narrow-pipe hash design. This design causes many inherent flaws, including length-extension, multicollisions,^[8] long message attacks,^[9] generate-and-paste attacks, and also cannot be parallelized. As a result, modern hash functions are built on wide-pipe constructions that have a larger internal state size — which range from tweaks of the Merkle–Damgård construction^[8] to new constructions such as the sponge construction and HAIFA construction.^[10] None of the entrants in the NIST hash function competition use a classical Merkle–Damgård construction.^[11]

Meanwhile, truncating the output of a longer hash, such as used in SHA-512/256, also defeats many of these attacks.^[12]

Use in building other cryptographic primitives

Hash functions can be used to build other cryptographic primitives. For these other primitives to be cryptographically secure, care must be taken to build them correctly.

Message authentication codes (MACs) (also called keyed hash functions) are often built from hash functions. HMAC is such a MAC.

Just as block ciphers can be used to build hash functions, hash functions can be used to build block ciphers. Luby-Rackoff constructions using hash functions can be provably secure if the underlying hash function is secure. Also, many hash functions (including SHA-1 and SHA-2) are built by using a special-purpose block cipher in a Davies–Meyer or other construction. That cipher can also be used in a conventional mode of operation, without the same security guarantees. See SHACAL, BEAR and LION.

Pseudorandom number generators (PRNGs) can be built using hash functions. This is done by combining a (secret) random seed with a counter and hashing it.

Some hash functions, such as Skein, Keccak, and RadioGatún output an arbitrarily long stream and can be used as a stream cipher, and stream ciphers can also be built from fixed-length digest hash functions. Often this is done by first building a cryptographically secure pseudorandom number generator and then using its stream of random bytes as keystream. SEAL is a stream cipher that uses SHA-1 to generate internal tables, which are then used in a keystream generator more or less unrelated to the hash algorithm. SEAL is not guaranteed to be as strong (or weak) as SHA-1. Similarly, the key expansion of the HC-128 and HC-256 stream ciphers makes heavy use of the SHA-256 hash function.

Concatenation

Concatenating outputs from multiple hash functions provides collision resistance as good as the strongest of the algorithms included in the concatenated result. For example, older versions of Transport Layer Security (TLS) and Secure Sockets Layer (SSL) use concatenated MD5 and SHA-1 sums.^[13]^[14]This ensures that a method to find collisions in one of the hash functions does not defeat data protected by both hash functions.

For Merkle–Damgård construction hash functions, the concatenated function is as collision-resistant as its strongest component, but not more collision-resistant. Antoine Joux observed that 2-collisions lead to n-collisions: if it is feasible for an attacker to find two messages with the same MD5 hash, the attacker can find as many messages as the attacker desires with identical MD5 hashes with no greater difficulty.^[15] Among the n messages with the same MD5 hash, there is likely to be a collision in SHA-1. The additional work needed to find the SHA-1 collision (beyond the exponential birthday search) requires only polynomial time.^[16]^[17]

Cryptographic hash algorithms

There are many cryptographic hash algorithms; this section lists a few algorithms that are referenced relatively often. A more extensive list can be found on the page containing a comparison of cryptographic hash functions.

MD5

MD5 was designed by Ronald Rivest in 1991 to replace an earlier hash function MD4, and was specified in 1992 as RFC 1321. Collisions against MD5 can be calculated within seconds which makes the algorithm unsuitable for most use cases where a cryptographic hash is required. MD5 produces a digest of 128 bits (16 bytes).

SHA-1

SHA-1 was developed as part of the U.S. Government's Capstone project. The original specification - now commonly called SHA-0 - of the algorithm was published in 1993 under the title Secure Hash Standard, FIPS PUB 180, by U.S. government standards agency NIST (National Institute of Standards and Technology). It was withdrawn by the NSA shortly after publication and was superseded by the revised version, published in 1995 in FIPS PUB 180-1 and commonly designated SHA-1. Collisions against the full SHA-1 algorithm can be produced using the shattered attack and the hash function should be considered broken. SHA-1 produces a hash digest of 160 bits (20 bytes).

Documents may refer to SHA-1 as just 'SHA', even though this may conflict with the other Standard Hash Algorithms such as SHA-0, SHA-2 and SHA-3.

RIPEMD-160

RIPEMD (RACE Integrity Primitives Evaluation Message Digest) is a family of cryptographic hash functions developed in Leuven, Belgium, by Hans Dobbertin, Antoon Bosselaers and Bart Preneel at the COSIC research group at the Katholieke Universiteit Leuven, and first published in 1996. RIPEMD was based upon the design principles used in MD4, and is similar in performance to the more popular SHA-1. RIPEMD-160 has however not been broken. As the name implies, RIPEMD-160 produces a hash digest of 160 bits (20 bytes).

bcrypt

bcrypt is a password hashing function designed by Niels Provos and David Mazières, based on the Blowfish cipher, and presented at USENIX in 1999. Besides incorporating a salt to protect against rainbow table attacks, bcrypt is an adaptive function: over time, the iteration count can be increased to make it slower, so it remains resistant to brute-force search attacks even with increasing computation power.

Whirlpool

In computer science and cryptography, Whirlpool is a cryptographic hash function. It was designed by Vincent Rijmen and Paulo S. L. M. Barreto, who first described it in 2000. Whirlpool is based on a substantially modified version of the Advanced Encryption Standard (AES). Whirlpool produces a hash digest of 512 bits (64 bytes).

SHA-2

SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA), first published in 2001.[3] They are built using the Merkle–Damgård structure, from a one-way compression function itself built using the Davies–Meyer structure from a (classified) specialized block cipher.

SHA-2 basically consists of two hash algorithms: SHA-256 and SHA-512. SHA-224 is a variant of SHA-256 with different starting values and truncated output. SHA-384 and the lesser known SHA-512/224 and SHA-512/256 are all variants of SHA-512. SHA-512 is more secure than SHA-256 and is commonly faster than SHA-256 on 64 bit machines such as AMD64.

The output size in bits is given by the extension to the 'SHA' name, so SHA-224 has an output size of 224 bits (28 bytes), SHA-256 produces 32 bytes, SHA-384 produces 48 bytes and finally SHA-512 produces 64 bytes.

SHA-3

SHA-3 (Secure Hash Algorithm 3) was released by NIST on August 5, 2015. SHA-3 is a subset of the broader cryptographic primitive family Keccak. The Keccak algorithm is the work of Guido Bertoni, Joan Daemen, Michael Peeters, and Gilles Van Assche. Keccak is based on a sponge construction which can also be used to build other cryptographic primitives such as a stream cipher. SHA-3 provides the same output sizes as SHA-2: 224, 256, 384 and 512 bits.

Configurable output sizes can also be obtained using the SHAKE-128 and SHAKE-256 functions. Here the -128 and -256 extensions to the name imply the security strength of the function rather than the output size in bits.

BLAKE2

An improved version of BLAKE called BLAKE2 was announced in December 21, 2012. It was created by Jean-Philippe Aumasson, Samuel Neves, Zooko Wilcox-O'Hearn, and Christian Winnerlein with the goal to replace widely used, but broken MD5 and SHA-1 algorithms. When run on 64-bit x64 and ARM architectures, BLAKE2b is faster than SHA-3, SHA-2, SHA-1, and MD5. Although BLAKE nor BLAKE2 have not been standardized as SHA-3 it has been used in many protocols including the Argon2 password hash for the high efficiency that it offers on modern CPUs. As BLAKE was a candidate for SHA-3, BLAKE and BLAKE2 both offer the same output sizes as SHA-3 - including a configurable output size.

Attacks on cryptographic hash algorithms

There is a long list of cryptographic hash functions but many have been found to be vulnerable and should not be used. For instance, NIST selected 51 hash functions^[18] as candidates for round 1 of the SHA-3 hash competition, of which 10 were considered broken and 16 showed significant weaknesses and therefore didn't make it to the next round; more information can be found on the main article about the NIST hash function competitions.

Even if a hash function has never been broken, a successful attack against a weakened variant may undermine the experts' confidence. For instance, in August 2004 collisions were found in several then-popular hash functions, including MD5.^[19] These weaknesses called into question the security of stronger algorithms derived from the weak hash functions—in particular, SHA-1 (a strengthened version of SHA-0), RIPEMD-128, and RIPEMD-160 (both strengthened versions of RIPEMD).

On 12 August 2004, Joux, Carribault, Lemuet, and Jalby announced a collision for the full SHA-0 algorithm. Joux et al. accomplished this using a generalization of the Chabaud and Joux attack. They found that the collision had complexity 2⁵¹ and took about 80,000 CPU hours on a supercomputer with 256 Itanium 2 processors—equivalent to 13 days of full-time use of the supercomputer.

In February 2005, an attack on SHA-1 was reported that would find collision in about 2⁶⁹ hashing operations, rather than the 2⁸⁰ expected for a 160-bit hash function. In August 2005, another attack on SHA-1 was reported that would find collisions in 2⁶³ operations. Other theoretical weaknesses of SHA-1 have been known:^[20]^[21] and in February 2017 Google announced a collision in SHA-1.^[22] Security researchers recommend that new applications can avoid these problems by using later members of the SHA family, such as SHA-2, or using techniques such as randomized hashing^[23]^[1] that do not require collision resistance.

A successful, practical attack broke MD5 used within certificates for Transport Layer Security in 2008.^[24]

Many cryptographic hashes are based on the Merkle–Damgård construction. All cryptographic hashes that directly use the full output of a Merkle–Damgård construction are vulnerable against length extension attacks. This makes the MD5, SHA-1, RIPEMD-160, Whirlpool and the SHA-256 / SHA-512 hash algorithms all vulnerable against this specific attack. SHA-3, BLAKE2 and the truncated SHA-2 variants are not vulnerable against this type of attack.

References

^ ^a^bShai Halevi and Hugo Krawczyk, Randomized Hashing and Digital Signatures
^Schneier, Bruce. 'Cryptanalysis of MD5 and SHA: Time for a New Standard'. Computerworld. Retrieved 2016-04-20. Much more than encryption algorithms, one-way hash functions are the workhorses of modern cryptography.
^Katz, Jonathan; Lindell, Yehuda (2008). Introduction to Modern Cryptography. Chapman & Hall/CRC.
^Rogaway & Shrimpton 2004, in Sec. 5. Implications.
^'Flickr's API Signature Forgery Vulnerability'. Thai Duong and Juliano Rizzo.
^Lyubashevsky, Vadim and Micciancio, Daniele and Peikert, Chris and Rosen, Alon (2008). 'SWIFFT: A Modest Proposal for FFT Hashing'. Fast Software Encryption. Lecture Notes in Computer Science. 5086. Springer. pp. 54–72. doi:10.1007/978-3-540-71039-4_4. ISBN 978-3-540-71038-7.CS1 maint: Multiple names: authors list (link)
^Perrin, Chad (December 5, 2007). 'Use MD5 hashes to verify software downloads'. TechRepublic. Retrieved March 2, 2013.
^ ^a^bLucks, Stefan (2004). 'Design Principles for Iterated Hash Functions' – via Cryptology ePrint Archive, Report 2004/253.
^Kelsey, John; Schneier, Bruce (2004). 'Second Preimages on n-bit Hash Functions for Much Less than 2^n Work' – via Cryptology ePrint Archive: Report 2004/304.
^Biham, Eli; Dunkelman, Orr (24 August 2006). A Framework for Iterative Hash Functions – HAIFA. Second NIST Cryptographic Hash Workshop – via Cryptology ePrint Archive: Report 2007/278.
^Nandi, Mridul; Paul, Souradyuti (2010). 'Speeding Up The Widepipe: Secure and Fast Hashing' – via Cryptology ePrint Archive: Report 2010/193.
^Dobraunig, Christoph; Eichlseder, Maria; Mendel, Florian (February 2015). 'Security Evaluation of SHA-224, SHA-512/224, and SHA-512/256'(PDF).
^Florian Mendel; Christian Rechberger; Martin Schläffer.'MD5 is Weaker than Weak: Attacks on Concatenated Combiners'.'Advances in Cryptology – ASIACRYPT 2009'.p. 145.quote: 'Concatenating ... is often used by implementors to 'hedge bets' on hash functions. A combiner of the form MD5||SHA-1 as used in SSL3.0/TLS1.0 ... is an example of such a strategy.'
^Danny Harnik; Joe Kilian; Moni Naor; Omer Reingold; Alon Rosen.'On Robust Combiners for Oblivious Transfer and Other Primitives'.'Advances in Cryptology – EUROCRYPT 2005'.quote: 'the concatenation of hash functions as suggested in the TLS... is guaranteed to be as secure as the candidate that remains secure.'p. 99.
^Antoine Joux. Multicollisions in Iterated Hash Functions. Application to Cascaded Constructions. LNCS 3152/2004, pages 306–316 Full text.
^Finney, Hal (August 20, 2004). 'More Problems with Hash Functions'. The Cryptography Mailing List. Retrieved May 25, 2016.
^Hoch, Jonathan J.; Shamir, Adi (2008). 'On the Strength of the Concatenated Hash Combiner when All the Hash Functions Are Weak'(PDF). Retrieved May 25, 2016.
^Andrew Regenscheid, Ray Perlner, Shu-jen Chang, John Kelsey, Mridul Nandi, Souradyuti Paul, Status Report on the First Round of the SHA-3 Cryptographic Hash Algorithm Competition
^XiaoyunWang, Dengguo Feng, Xuejia Lai, Hongbo Yu, Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD
^Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu, Finding Collisions in the Full SHA-1
^Bruce Schneier, Cryptanalysis of SHA-1 (summarizes Wang et al. results and their implications)
^Fox-Brewster, Thomas. 'Google Just 'Shattered' An Old Crypto Algorithm – Here's Why That's Big For Web Security'. Forbes. Retrieved 2017-02-24.
^Shai Halevi, Hugo Krawczyk, Update on Randomized Hashing
^Alexander Sotirov, Marc Stevens, Jacob Appelbaum, Arjen Lenstra, David Molnar, Dag Arne Osvik, Benne de Weger, MD5 considered harmful today: Creating a rogue CA certificate, accessed March 29, 2009.

External links

Paar, Christof; Pelzl, Jan (2009). '11: Hash Functions'. Understanding Cryptography, A Textbook for Students and Practitioners. Springer. Archived from the original on 2012-12-08. (companion web site contains online cryptography course that covers hash functions)
'The ECRYPT Hash Function Website'.
Buldas, A. (2011). 'Series of mini-lectures about cryptographic hash functions'. Archived from the original on 2012-12-06.
Rogaway, P.; Shrimpton, T. (2004). 'Cryptographic Hash-Function Basics: Definitions, Implications, and Separations for Preimage Resistance, Second-Preimage Resistance, and Collision Resistance'. CiteSeerX10.1.1.3.6200.

BEAR and LION ciphers

The BEAR and LION block ciphers were invented by Ross Anderson and Eli Biham by combining a stream cipher and a cryptographic hash function. The algorithms use a very large variable block size, on the order of 213 to 223 bits or more. Both are 3-round generalized (alternating) Feistel ciphers, using the hash function and the stream cipher as round functions. BEAR uses the hash function twice with independent keys, and the stream cipher once. LION uses the stream cipher twice and the hash function once. The inventors proved that an attack on either BEAR or LION that recovers the key would break both the stream cipher and the hash.

HAS-160

HAS-160 is a cryptographic hash function designed for use with the Korean KCDSA digital signature algorithm. It is derived from SHA-1, with assorted changes intended to increase its security. It produces a 160-bit output.

HAS-160 is used in the same way as SHA-1. First it divides input in blocks of 512 bits each and pads the final block. A digest function updates the intermediate hash value by processing the input blocks in turn.

The message digest algorithm consists of 80 rounds.

HAVAL

HAVAL is a cryptographic hash function. Unlike MD5, but like most modern cryptographic hash functions, HAVAL can produce hashes of different lengths – 128 bits, 160 bits, 192 bits, 224 bits, and 256 bits. HAVAL also allows users to specify the number of rounds (3, 4, or 5) to be used to generate the hash. HAVAL was broken in 2004.HAVAL was invented by Yuliang Zheng, Josef Pieprzyk, and Jennifer Seberry in 1992.

HMAC

In cryptography, an HMAC (sometimes expanded as either keyed-hash message authentication code or hash-based message authentication code) is a specific type of message authentication code (MAC) involving a cryptographic hash function and a secret cryptographic key. It may be used to simultaneously verify both the data integrity and the authentication of a message, as with any MAC. Any cryptographic hash function, such as SHA-256 or SHA-3, may be used in the calculation of an HMAC; the resulting MAC algorithm is termed HMAC-X, where X is the hash function used (e.g. HMAC-SHA256 or HMAC-SHA3). The cryptographic strength of the HMAC depends upon the cryptographic strength of the underlying hash function, the size of its hash output, and the size and quality of the key.

HMAC uses two passes of hash computation. The secret key is first used to derive two keys – inner and outer. The first pass of the algorithm produces an internal hash derived from the message and the inner key. The second pass produces the final HMAC code derived from the inner hash result and the outer key. Thus the algorithm provides better immunity against length extension attacks.

An iterative hash function breaks up a message into blocks of a fixed size and iterates over them with a compression function. For example, SHA-256 operates on 512-bit blocks. The size of the output of HMAC is the same as that of the underlying hash function (e.g., 256 and 1600 bits in the case of SHA-256 and SHA-3, respectively), although it can be truncated if desired.

HMAC does not encrypt the message. Instead, the message (encrypted or not) must be sent alongside the HMAC hash. Parties with the secret key will hash the message again themselves, and if it is authentic, the received and computed hashes will match.

The definition and analysis of the HMAC construction was first published in 1996 in a paper by Mihir Bellare, Ran Canetti, and Hugo Krawczyk, and they also wrote RFC 2104 in 1997. The 1996 paper also defined a variant called NMAC. FIPS PUB 198 generalizes and standardizes the use of HMACs. HMAC is used within the IPsec and TLS protocols and for JSON Web Tokens.

Hash chain

A hash chain is the successive application of a cryptographic hash function to a piece of data. In computer security, a hash chain is a method to produce many one-time keys from a single key or password. For non-repudiation a hash function can be applied successively to additional pieces of data in order to record the chronology of data's existence.

JH (hash function)

JH is a cryptographic hash function submitted to the NIST hash function competition by Hongjun Wu. Though chosen as one of the five finalists of the competition, JH ultimately lost to NIST hash candidate Keccak. JH has a 1024-bit state, and works on 512-bit input blocks. Processing an input block consists of three steps:

XOR the input block into the left half of the state.

Apply a 42-round unkeyed permutation (encryption function) to the state. This consists of 42 repetitions of:

Break the input into 256 4-bit blocks, and map each through one of two 4-bit S-boxes, the choice being made by a 256-bit round-dependent key schedule. Equivalently, combine each input block with a key bit, and map the result through a 5→4 bit S-box.

Mix adjacent 4-bit blocks using a maximum distance separable code over GF(24).

Permute 4-bit blocks so that they will be adjacent to different blocks in following rounds.

XOR the input block into the right half of the state.The resulting digest is the first 224, 256, 384 or 512 bits from the 1024-bit final value.

It is well suited to a bit slicing implementation using the SSE2 instruction set, giving speeds of 16.8 cycles per byte.

Kupyna

Kupyna is a cryptographic hash function defined in the Ukrainian national standard DSTU 7564:2014. It was created to replace an obsolete GOST hash function defined in the old standard GOST 34.11-95, similar to Streebog hash function standardized in Russia.

In addition to the hash function, the standard also describes message authentication code generation using Kupyna with digest sizes 256, 384 and 512 bits.

MD2 (hash function)

The MD2 Message-Digest Algorithm is a cryptographic hash function developed by Ronald Rivest in 1989. The algorithm is optimized for 8-bit computers. MD2 is specified in RFC 1319. Although MD2 is no longer considered secure, even as of 2014, it remains in use in public key infrastructures as part of certificates generated with MD2 and RSA. The 'MD' in MD2 stands for 'Message Digest'.

MD4

The MD4 Message-Digest Algorithm is a cryptographic hash function developed by Ronald Rivest in 1990. The digest length is 128 bits. The algorithm has influenced later designs, such as the MD5, SHA-1 and RIPEMD algorithms. The initialism 'MD' stands for 'Message Digest.'

The security of MD4 has been severely compromised. The first full collision attack against MD4 was published in 1995 and several newer attacks have been published since then. As of 2007, an attack can generate collisions in less than 2 MD4 hash operations. A theoretical preimage attack also exists.

A variant of MD4 is used in the ed2k URI scheme to provide a unique identifier for a file in the popular eDonkey2000 / eMule P2P networks. MD4 was also used by the rsync protocol (prior to version 3.0.0.)

MD4 is used to compute NTLM password-derived key digests on Microsoft Windows NT, XP, Vista, 7, 8, and 10.

MD6

The MD6 Message-Digest Algorithm is a cryptographic hash function. It uses a Merkle tree-like structure to allow for immense parallel computation of hashes for very long inputs. Authors claim a performance of 28 cycles per byte for MD6-256 on an Intel Core 2 Duo and provable resistance against differential cryptanalysis. The source code of the reference implementation was released under MIT license.Speeds in excess of 1 GB/s have been reported to be possible for long messages on 16-core CPU architecture.In December 2008, Douglas Held of Fortify Software discovered a buffer overflow in the original MD6 hash algorithm's reference implementation. This error was later made public by Ron Rivest on 19 February 2009, with a release of a corrected reference implementation in advance of the Fortify Report.MD6 was submitted to the NIST SHA-3 competition. However, on July 1, 2009, Rivest posted a comment at NIST that MD6 is not yet ready to be a candidate for SHA-3 because of speed issues, a 'gap in the proof that the submitted version of MD6 is resistant to differential attacks', and an inability to supply such a proof for a faster reduced-round version, although Rivest also stated at the MD6 website that it is not withdrawn formally. MD6 did not advance to the second round of the SHA-3 competition. In September 2011, a paper presenting an improved proof that MD6 and faster reduced-round versions are resistant to differential attacks was posted to the MD6 website.

N-Hash

In cryptography, N-Hash is a cryptographic hash function based on the FEAL round function, and is now considered insecure. It was proposed in 1990 by Miyaguchi et al.; weaknesses were published the following year.

N-Hash has a 128-bit hash size. A message is divided into 128-bit blocks, and each block is combined with the hash value computed so far using the g compression function. g contains eight rounds, each of which uses an F function, similar to the one used by FEAL.

Eli Biham and Adi Shamir (1991) applied the technique of differential cryptanalysis to N-Hash, and showed that collisions could be generated faster than by a birthday attack for N-Hash variants with even up to 12 rounds.

Preimage attack

In cryptography, a preimage attack on cryptographic hash functions tries to find a message that has a specific hash value. A cryptographic hash function should resist attacks on its preimage (set of possible inputs).

In the context of attack, there are two types of preimage resistance:

preimage resistance: for essentially all pre-specified outputs, it is computationally infeasible to find any input that hashes to that output, i.e., given y, it is difficult to find an x such that h(x) = y.

second-preimage resistance: it is computationally infeasible to find any second input which has the same output as that of a specified input, i.e., given x, it is difficult to find a second preimage x′ ≠ x such that h(x) = h(x′).These can be compared with a collision resistance, in which it is computationally infeasible to find any two distinct inputs x, x′ that hash to the same output, i.e., such that h(x) = h(x′).Collision resistance implies second-preimage resistance, but does not guarantee preimage resistance. Conversely, a second-preimage attack implies a collision attack (trivially, since, in addition to x′, x is already known right from the start).

Public key fingerprint

In public-key cryptography, a public key fingerprint is a short sequence of bytes used to identify a longer public key. Fingerprints are created by applying a cryptographic hash function to a public key. Since fingerprints are shorter than the keys they refer to, they can be used to simplify certain key management tasks. In Microsoft software, 'thumbprint' is used instead of 'fingerprint'.

SM3 (hash function)

SM3 is a cryptographic hash function used in the Chinese National Standard. It was published by the State Cryptography Administration (Chinese: 国家密码管理局) on 2010-12-17 as 'GM/T 0004-2012: SM3 cryptographic hash algorithm'.SM3 is mainly used in digital signatures, message authentication codes, and pseudorandom number generators. The algorithm is public and is claimed by the China Internet Network Information Center to be like SHA-256 in security and efficiency.

Security of cryptographic hash functions

In cryptography, cryptographic hash functions can be divided into two main categories. In the first category are those functions whose designs are based on a mathematical problem and thus their security follows from rigorous mathematical proofs, complexity theory and formal reduction. These functions are called Provably Secure Cryptographic Hash Functions. However this does not mean that such a function could not be broken. To construct them is very difficult and only a few examples were introduced. The practical use is limited.

In the second category are functions that are not based on mathematical problems but on an ad hoc basis, where the bits of the message are mixed to produce the hash. They are then believed to be hard to break, but no such formal proof is given. Almost all widely spread hash functions fall in this category. Some of these functions are already broken and are no longer in use.

Skein (hash function)

Skein is a cryptographic hash function and one of five finalists in the NIST hash function competition. Entered as a candidate to become the SHA-3 standard, the successor of SHA-1 and SHA-2, it ultimately lost to NIST hash candidate Keccak.The name Skein refers to how the Skein function intertwines the input, similar to a skein of yarn.

Snefru

Snefru is a cryptographic hash function invented by Ralph Merklein 1990 while working at Xerox PARC.The function supports 128-bit and 256-bit output. It was named after the Egyptian Pharaoh Sneferu, continuing the tradition of the Khufu and Khafre block ciphers.

The original design of Snefru was shown to be insecure by Eli Biham and Adi Shamir who were able to use differential cryptanalysis to find hash collisions. The design was then modified by increasing the number of iterations of the main pass of the algorithm from two to eight. Although differential cryptanalysis can break the revised version with less complexity than brute force search (a certificational weakness), the attack requires

{displaystyle 2^{88.5}}

operations and is thus not currently feasible in practice.

Tiger (hash function)

In cryptography, Tiger is a cryptographic hash function designed by Ross Anderson and Eli Biham in 1995 for efficiency on 64-bit platforms. The size of a Tiger hash value is 192 bits. Truncated versions (known as Tiger/128 and Tiger/160) can be used for compatibility with protocols assuming a particular hash size. Unlike the SHA-2 family, no distinguishing initialization values are defined; they are simply prefixes of the full Tiger/192 hash value.

Tiger2 is a variant where the message is padded by first appending a byte with the hexadecimal value of 0x80 as in MD4, MD5 and SHA, rather than with the hexadecimal value of 0x01 as in the case of Tiger. The two variants are otherwise identical.

Whirlpool (hash function)

In computer science and cryptography, Whirlpool (sometimes styled WHIRLPOOL) is a cryptographic hash function. It was designed by Vincent Rijmen (co-creator of the Advanced Encryption Standard) and Paulo S. L. M. Barreto, who first described it in 2000.

The hash has been recommended by the NESSIE project. It has also been adopted by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) as part of the joint ISO/IEC 10118-3 international standard.

Cryptographic hash function

Technology

Cryptographic hash function

Consensus mechanisms

Proof of work currencies

SHA-256-based
Ethash-based
Scrypt-based
Equihash-based
CryptoNote-based
X11-based
Lyra2-based
Other

Proof of stake currencies

ERC-20 tokens

Other currencies