Java 로 만든 Fast MD5 - MessageDigest

2008. 11. 26. 16:04Java


Timothy W Macinta - Contract Software Development

Fast MD5 Implementation in Java(TM)


Java's Built-In MD5 Support

You don't need to use this Fast MD5 Implementation to get an MD5 hash in Java (though you are certainly welcome to). The standard edition of Java comes with MD5 support built in. You might want to use this Fast MD5 Implementation if one or more of the following applies:

  • Java's built-in MD5 support is a bottleneck for your program's performance and you want something faster.
  • You are using a version of Java which doesn't have MD5 support, such as J2ME MIDP/CLDC.
  • You want the extra convenience methods for hashing a file, hashing a string, converting the hash to a hex string, etc.
For those of you who ended up on this page searching for how to calculate an MD5 hash in Java, here's the gist of how to do it (without exception checking):
    MessageDigest digest = java.security.MessageDigest.getInstance("MD5");
digest.update(...your data here...);
byte[] hash = digest.digest();
You can then convert the hash into the familiar hex format (e.g., "d41d8cd98f00b204e9800998ecf8427e"), if you wish. If you don't know how to do that, the Fast MD5 Implementation can do it for you - download the code and then check out the tutorial.

How Fast Is It?

Short answer: Much faster than any other Java implementation that I have tested and (surprisingly) even faster than the native, non-Java MD5 implementation on some systems.

Long answer: First of all, it is important to note that the term "fast" is used here in relative terms. The implementation of the MD5 message digest algorithm available on this page is written in Java and is fast compared with other implementations written in Java, both because it is heavily optimized by itself and because there is an optional native method that makes it even faster when the platform supports it. How it compares to a sensible implementation written in a language, such as C, that is compiled directly to machine code, is heavily dependent upon how good of a job the JIT compiler in your JVM does in compiling the code or whether you are able to use the optional native method.

Here is a table detailing the amount of time it took to checksum one particular file of size 679,477,248 bytes on one particular Linux system using various permutations of different MD5 code from within the JDK 1.4.1 from Sun (where applicable):

Implementation system user elapsed sys+user
Default Java Impl - The default MD5 implementation that ships with the JDK (attainable through java.security.MessageDigest). 6.423333 78.70833 213.7667 85.13167
Default Java Impl Interpreted - The default MD5 implementation that ships with the JDK (attainable through java.security.MessageDigest). The JVM was forced to run in interpreted only mode. 7.756667 1669.12 2275.383 1676.877
Fast MD5, No Native - The Fast MD5 implementation on this web page. Native methods were disabled. 6.121667 43.40667 215.365 49.52833
Fast MD5 Interpreted, No Native - The Fast MD5 implementation on this web page. Native methods were disabled. The JVM was forced to run in interpreted only mode. 8.526667 826.3383 1060.125 834.865
Fast MD5 - The Fast MD5 implementation on this web page. Native methods were enabled. 6.336667 28.345 212.8033 34.68167
Fast MD5 Interpreted - The Fast MD5 implementation on this web page. Native methods were enabled. The JVM was forced to run in interpreted only mode. 6.43 27.895 213.1483 34.325
md5sum Binary - The Linux native "md5sum" program distributed in the textutils-2.0.14-2.i386.rpm with Red Hat Linux 7.2. 4.53 55.34667 211.2883 59.87667
Speed comparisons were made of the above MD5 message digest algorithm implementations by using the "time" program to roughly time how much time each implementation took. As can be seen from above, all the implementations took roughly the same amount of real time to execute, except when the JVM was run in interpreted mode (in which case it took much longer). This indicates that file I/O was likely the bottleneck. Therefore, the MD5 implementation that ships with the JDK will be adequate in a large number of cases. On the other hand, my fast implementation will be useful when the underlying I/O is very fast (e.g., if the data to be hashed is being created on-the-fly in memory), the CPU is inherently slow, CPU cycles are scarce (e.g., many other programs are running at the same time), the Java VM being used does not provide an adequately fast JIT, or in other cases where it is desirable to minimize CPU usage.

The very surprising (for me) thing to note is that the Fast MD5 implementation outperforms the native "md5sum" binary even when the native methods aren't used. Oddly enough, the same did not hold true on Windows where the native binary was faster in all cases (perhaps the underlying I/O implementation in the Linux JVM is more efficient than the same in the Windows JVM).

Making It Even Faster

Thanks to Benjamin "Quincy" Cabell V for sponsoring the addition of an optional native method to the MD5 package. Now that the native method has been added, the remaining optimizations that could be made to this package are no longer as dramatic as they used to be. If you try my optimized implementation and decide that you still need something even faster, try the following:

  • Make sure a JIT is being used (check your JAVA_COMPILER environment variable to see if JIT compilation may have been disabled). Using a JIT should make a huge difference over a plain interpreter.
  • Make sure the native method is being used. This will make a massive difference if you are stuck using a plain interpreter.
  • Performance on Windows could probably be optimized some more to the point that it approaches the native md5sum.exe binary. This would involve determining where the bottleneck is now - the first places to look would be in the underlying Java I/O (in which case, perhaps using the nio package would help) and in the specific compiler that was used to compile the DLL in the distribution (other compilers might produce faster code than the MinGW compiler).
  • The speed of an existing MD5 implementation can be achieved very easily by simply calling Runtime.getRuntime().exec("md5sum") and then using the Process streams to specify the data and retrieve the hash. However, this would add even more deployment and maintenance complexity than using a native method because (among other things) the licensing terms for each platform's "md5sum" implementation would then come into play. This would also not be as flexible as the native method solution because you would be limited to a single checksum at the end of each data stream whereas with the native method solution the checksum of all the data seen thus far can be calculated an arbitrary number of times at an arbitrary number of points in time.
  • If the use of native methods is not a viable option, there might be additional speed improvements that can be attained from the Java-only version of the code. The author of the MD5 implementation at www.rodage.org wrote me to say that he thinks his Java-only implementation may be faster, though I haven't run the numbers (and I don't believe that he had either at the time). I suspect that the speed difference may not be substantial, and it is unlikely that it would outperform the optional native method, but it could be worth a try. As of this writing, his implementation was in the public domain, so it is could probably be assimilated without a licensing conflict.

The Code

This implementation has been derived from code originally written by Santeri Paavolainen and was retrieved from his website at http://www.cs.hut.fi/~santtu/java/ . The initial changes that I (Tim Macinta) made were heavy optimization of the code, some bug fixes, and I replaced Mr. Paavolainen's test suite (which was under a separate license) with an expanded test suite of my own. I also moved the code into a Java package because it was originally distributed without its own package. I attempted to contact Mr. Paavolainen to see if he was interested in incorporating my changes into his distribution, but after about a week I had not heard back from him so I decided to make my changes available on my own website.

Browse the Javadoc documentation.

Highlights of the distribution contents:

  • MD5.java - The optimized implementation of the MD5 hashing algorithm. Please see the documentation for this file for instructions on where in your path to place the native library and instructions for how to disable the search for the native library if you are running the code from within an applet or within any other context with a restrictive SecurityManager.
  • MD5.c - The code for the optional native method that can be used to speed up the implementation even further. Compilation instructions for Linux, Mac OS X, and Windows can be found within the comments to the code at the beginning of the file.
  • MD5InputStream.java - Calculates the MD5 hash of data read through a specified InputStream. Also contains some test code for computing the MD5 hash of a file (see the main() method).
  • MD5OutputStream.java - Calculates the MD5 hash of data written to a specified OutputStream. Also contains some test code for computing the MD5 hash of a file (see the main() method).
  • MD5OutputStreamTest.java (in the "test" package) - Tests the MD5OutputStream class and the correctness of the results by feeding it a very large amount of random data and comparing the results to feeding the same data through the native "md5sum" binary (this requires an "md5sum" binary to be in your path).

Please Link To This Page

If you download the code and find it useful, please link to http://www.twmacinta.com/myjava/fast_md5.php to show your appreciation. This is not a requirement of the license, it is just a friendly request. This helps me because it helps spread the word about my contract software development services. It helps you because the more interest my MD5 library receives, the more it will be improved. Also, the more interest it receives, the more I will be encouraged to release other libraries I have sitting around. Linking to this page is a free, easy way to show your support.

Download Version 2.6.2

The code in the following download is provided under the GNU LGPL version 2.1, or (at your option) any later version. If you want to discuss licensing under different terms, contact me (distributing the code under a license that is not compatible with the GNU LGPL would require a full rewrite of the code because the code was derived from code under the GNU LGPL).

Have you read the GNU LGPL version 2.1, and do you agree to its terms?


Yes, I Agree
 
No, I Don't Agree
 

J2ME MIDP/CLDC Versions (Preliminary)

Note: you may be able to use the MD5 hashing support that is built into Java even in J2ME MIDP/CLDC, if your target platforms all come with The SATSA-CRYPTO Optional Package and the MD5 algorithm. Usage should be very close to the example above for Java's Standard Edition (the call to digest() will require parameters, in this case). If you cannot guarantee that all of your target platforms will have the necessary support, you can simply use the Fast MD5 Implementation instead.

There is no official J2ME MIDP/CLDC version of the Fast MD5 Implementation yet, but you can probably use the distribution in J2ME anyway, with a tiny bit of effort. Multiple people have been kind enough to contribute changes back to make the library work in J2ME, I just haven't personally tested them, or integrated them with the build process yet, and I am providing them as a convenience before official support is added. Alternatively, you can also use the regular distribution and strip it down to work in J2ME. The options in detail are:

  • Bem Jones-Bey has contributed a J2ME MIDP/CLDC version of the library and made it available at http://oplop.googlecode.com/svn/trunk/Java/src/com/twmacinta/util/
  • You can probably convert the regular distribution to one that works in J2ME MIDP/CLDC by removing the native method code, removing the test code, recompiling, and removing whatever else creates a compilation error (e.g., references to java.io.File). While the native method is optional, the mere presence of the signature would probably produce errors during the MIDP/CLDC preverification process.

Upcoming Versions

I generally release new versions of the Fast MD5 distribution either when I add a major new feature, when I have more than a few minor improvements accumulated, or when there are bugs found. You can receive notifications of new releases by subscribing to the project at freshmeat.net. Looking toward the future, here are some other things that I think might be nice to add to the distribution, but which I'll have to weigh against my free time and motivation (and you can always motivate me to add your own favorite feature by contacting me about contract development):
  • A J2ME version, primarily for MIDP/CLDC. If you want an unofficial version of this now, see the J2ME MIDP/CLDC section above.
  • Restructure code to also work as a Java Cryptographic Service Provider.
  • Code to also search for the native library in the directories specified by the system property java.library.path. For more details on the motivation behind this change as well as an interim hack that can give you support for this now on some platforms, see the original email requesting java.library.path support. The code was originally written for a version of the JDK that predates this property, which is why support for it wasn't included earlier.
  • Documentation as part of the download.
  • Maven support. Martin West has contributed the necessary files for Maven support, which you can use if you want support now. Before adding these files to the mainline distribution, I need to test them and also make sure that they can peacefully coexist with the regular Ant build file (they look like they can). The contributed files were created against version 2.6.1 but may very well work with other versions too.
  • Native method support for Solaris. A correspondent who requested to remain anonymous was kind enough to contribute a patch to enable native method support on Solaris. If you wish to use it, you can download MD5_solaris_contrib.c and use it to replace the file MD5.c which comes with the distribution (note that I think you'll still need to use the original MD5.c file on other platforms). I haven't tested this myself and probably won't fold it into the main distribution until I can repeatedly test it, so if you would like to see Solaris support added to the official distribution and are willing to send me a Solaris box, please contact me. I think that the remaining steps which are needed to put Solaris support on par with the other platforms are: 1) add appropriate ifdef statements so that the same MD5.c file can be used on all platforms, 2) add a Solaris specific path to where the library is searched for in MD5.java, 3) add compilation instructions to the comments at the top of MD5.c, and 4) run the test script for an extensive period of time.
  • Native method support for more platforms. If you want support for a particular platform and are willing to send me the necessary hardware, contact me.
  • Stand-alone executable Jar and/or binary for use from command line.
  • Stand-alone executable Jar with a graphical interface. (Note: this may already be complete - see MD5Gui.)
  • A discussion forum for this web page.
  • Add native method compilation support to the Ant build file.
  • Add support for more platforms to the Makefile.
  • Add cross-compilation support for native method to Ant build file and Makefile.

Quick Tutorial

Do you want to calculate the MD5 hash of a file from within your Java program? Here's a quick way to do it with the Fast MD5 Distribution (just set filename to the name of the file that you want to checksum):
    String hash = MD5.asHex(MD5.getHash(new File(filename)));

Do you want to calculate the MD5 hash of a string from within your Java program? Here's a quick way to do it with the Fast MD5 Distribution (just set myString to the string that you want to checksum):
    MD5 md5 = new MD5();
md5.Update(myString, null);
String hash = md5.asHex();

Note that the above code will convert the string to an array of bytes using the ISO-8859-1 character encoding method. You can specify a different method using the second parameter to Update() or you can leave out the second parameter entirely to use the target platform's default encoding method.

Software Using This Library

Want to add your software to this list? Contact me.

Acknowledgments

Thanks to all the contributors:
  • This implementation has been derived from code originally written by Santeri Paavolainen and was retrieved from his website at http://www.cs.hut.fi/~santtu/java/ .
  • Heavy optimizations, testing, and ongoing maintenance are provided by Tim Macinta.
  • Thanks to Pensamos Digital, Inc. for sponsoring the initial optimization of the code for use in their Magic Mirror Backup peer to peer backup software.
  • Thanks to Benjamin "Quincy" Cabell V for sponsoring the addition of optional native methods.
  • Thanks to Dannes Wessels for contributing an Ant build file.
  • Thanks to Marco for pointing out that the documentation was incorrect regarding how to disable the test for native methods.
  • Thanks to Roberto Aguilar for submitting a patch to add support for native methods to Mac OS X.
  • Thanks to Alec Bickerton for contributing a J2ME MIDP/CLDC version with an accompanying example Eclipse project.
  • Thanks to Martin West for finding, reporting, and fixing a bug in the Ant build file that caused the "dist" target to not work if the "docs" directory was not present. Also, thanks to Martin for contributing Maven support.
  • Thanks to Fabio Strozzi for suggesting java.library.path support and for sending some code to achieve that on some platforms.
  • Thanks to Bem Jones-Bey for contributing a J2ME MIDP/CLDC version.
  • Thanks to Christopher Larrieu for reporting the occurrence of stack overflows when the library was used with large buffers.
  • Thanks to everybody else that has submitted feedback.

Change Log

  • Version 2.6.2
    • Fixed stack overflow bug which occurred in native method when called with a large buffer. All users which update the hash's state using large buffers are encouraged to upgrade in order to guard against crashes and potential security implications.
    • Added native method support for OS X on x86 hardware.
  • Version 2.6.1
    • Martin West contributed a bug fix and some code refactoring to make all targets work out of the box in the Ant build file. Previously, the "dist" target did not work if the "docs" directory was not present.
  • Version 2.6
    • Added OS X support for native method.
    • Eked a little more speed out of the native method version by special-casing little endian architectures and skipping operations that would be redundant on them.
    • Added the ability to access the MD5 instance held by an MD5InputStream or MD5OutputStream. The primary motivation here was so that you can reset the MD5 instance mid-stream and thereby checksum blocks of a stream (this was to enable more efficient error correction in Magic Mirror Backup).
    • Added a utility method for quickly obtaining the MD5 hash of a file.
    • Fixed incorrect documentation describing how to bypass test for native methods.
    • Added Linux Makefile to distribution.
    • Added Ant build file to distribution.
    • Restructured directories to be Ant-friendly.
    • Fixed some typos.
  • Version 2.5
    • Added an optional native method for hash calculation for additional speed improvement.
    • Cleaned up the code a little to make it friendlier to Javadoc.
  • Version 2.1
    • Added a note to Update(String) explaining that the results of the method may vary across platforms. Thanks to Peter Speck for pointing this out.
    • Added an overloaded version of the Update() method which takes as parameters both a String and a character encoding for converting the String to a byte array.


Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries.

All Pages, Images, and Other Content Copyright © 1997 - 2008 Timothy W Macinta , except where noted. All Rights Reserved. The "Tim Macinta Now" button may be used on web pages that are external to this site to provide a link back to this page. For usage guidelines on KMFMS artwork please see http://www.kmfms.com/usage-guide.html.