private key;16-byte checksum;collisions;Rogier;Chauvand;Damgard;Merkle;iterative structure;Den Boer;Dobbert; pseudo-collisions">
MD2 [Kal92], MD4 [Riv91b] [Riv92b], and MD5 [Riv92c] are message-digest algorithms developed by Rivest. They are meant for digital signature applications where a large message has to be "compressed" in a secure manner before being signed with the private key. All three algorithms take a message of arbitrary length and produce a 128-bit message digest. While the structures of these algorithms are somewhat similar, the design of MD2 is quite different from that of MD4 and MD5 and MD2 was optimized for 8-bit machines, whereas MD4 and MD5 were aimed at 32-bit machines. Description and source code for the three algorithms can be found as Internet RFCs 1319 - 1321 [Kal92] [Riv92b] [Riv92c].
MD2 was developed by Rivest in 1989. The message is first padded so that its length in bytes is divisible by 16. A 16-byte checksum is then appended to the message, and the hash value is computed on this resulting message. Rogier and Chauvaud have found that collisions for MD2 can be constructed if the calculation of the checksum is omitted [RC95]. This is the only cryptanalytic result known for MD2.
MD4 was developed by Rivest in 1990. The message is padded to ensure that its length in bits plus 448 is divisible 512. A 64-bit binary representation of the original length of the message is then concatenated to the message. The message is processed in 512-bit blocks in the Damgård/Merkle iterative structure (see Question 97), and each block is processed in three distinct rounds. Attacks on versions of MD4 with either the first or the last rounds missing were developed very quickly by Den Boer and Bosselaers [DB92] and others. Dobbertin [Dob95] has shown how collisions for the full version of MD4 can be found in under a minute on a typical PC. Clearly, MD4 should now be considered broken.
MD5 was developed by Rivest in 1991. It is basically MD4 with "safety-belts" and while it is slightly slower than MD4, it is more secure. The algorithm consists of four distinct rounds, which have a slightly different design from that of MD4. Message-digest size, as well as padding requirements, remains the same. Den Boer and Bosselaers [DB94] have found pseudo-collisions for MD5 (see Question 98), but there are no other known cryptanalytic results.
Van Oorschot and Wiener [VW94] have considered a brute-force search for collisions (see Question 96) in hash functions, and they estimate that a collision search machine designed specifically for MD5 (costing $10 million in 1994) could find a collision for MD5 in 24 days on average. The general techniques can be applied to other hash functions.
More details on MD2, MD4, and MD5 can be found in [Pre93] and [Rob95c].