Some security tools still stick to MD5 when identifying malware samples years after practical collisions were shown against the algorithm. This can be exploited by first showing these tools a harmless sample (Sheep) and then a malicious one (Wolf) that have the same MD5 hash. Please use this code to test if the security products in your reach use MD5 internally to fingerprint binaries and share your results by issuing a pull request updating the contents of results/!
Works-on-a-different-machine-than-mine version, feedback is welcome!
- 32-bit Windows (virtual) machine (64-bit breaks stuff)
- Visual Studio 2012 to compile the projects (Express will do)
- Fastcoll for collisions
- Optional: Cygwin+MinGW to compile Evilize
Extract Fastcoll to the fastcoll directory. Name the executable fastcoll.exe
Use shepherd.bat to generate wolf.exe and sheep.exe (in the VS Development Command Prompt):
> shepherd.bat YOURPASSWORD your_shellcode.raw
After this step you should have your two colliding binaries (sheep.exe and wolf.exe in the evilize directory).
For more information see the tutorial of Peter Selinger, older revisions of this document or the source code...
shepherd.batexecutesshepherd.exewith the user supplied command line argumentsshepher.exegenerates a header file (sc.h) that contains the encrypted shellcode, the password and the CRC of the plain shellcode
shepherd.batexecutes the build process ofsheep.exesheep.exeis built withsc.hincluded by Visual Studio
shepherd.batexecutesevilize.exeevilize.execalculates a special IV for the chunk ofsheep.exeright before the block where the collision will happenevilize.exeexecutesfastcoll.exewith the IV as a parameterfastcoll.exegenerates two 128 byte colliding blocks:aandb
evilize.exereplaces the original string buffers ofsheep.exeso that they contain combinationsaandb- The resulting files (
evilize/wolf.exeandevilize/sheep.exe) have the same MD5 hashes but behave differently. The real code to be executed only appears in the memory ofevilize/wolf.exe.
To test the security products in your reach you should generate two pairs of samples (SHEEP1-WOLF1 and SHEEP2-WOLF2), preferably with the same payload. Since samples (or their fingerprints) are usually uploaded to central repositories (or "the cloud") precompiled samples are not included to avoid conflicts between independent testers.
After the samples are ready follow the methodology shown on the diagram below:
(*) If the product is not able to detect the first malicious sample, there are more serious problems to worry about than crypto-fu. In fact, the simple cryptography included in the provided boilerplate code poses as a hard challenge for various products... Try to use more obvious samples!
(**) The product most probably uses some trivial method to detect the boilerplate insted of the actual payload. You can try to introduce simple changes to the code like removing debug strings.
Please don't forget to share your positive results by issuing a pull request to the RESULTS.md file!
- Poisonous MD5 - Wolves Among the Sheep
- Peter Selinger: MD5 Collision Demo
- How to make two binaries with same MD5
- Stop using MD5 now!
Licenced under GNU/GPL if not otherwise stated.
