Bug#760817: ssdeep: wrong scoring on two fuzzy hashes with same block sizes

Tsukasa #01 (Oi) li at livegrid.org
Mon Sep 8 06:13:30 UTC 2014


Package: ssdeep
Version: 2.7-2
Severity: important
Tags: patch

Dear Maintainer,

ssdeep (and libfuzzy2 Debian package) before version 2.10 has a bug
which may make wrong score on two fuzzy hashes with same block sizes.
This will make clustering/comparing files unreliable.

This bug was fixed in 2.10 by Jesse Kornblum
<research at jessekornblum.com> but still not fixed in Debian versions
(sid, unstable and stable).
I encountered this bug while clustering about 10M files based on ssdeep
hashes and I had to recluster all the files.

Sorry that I have no `natural' examples to reproduce (because I slightly
changed the parameter after building patched versions of
ssdeep/libfuzzy2 2.7-2 and it will take about 2 months * 20 CPU cores to
compare clusters) but we can generate `artificial' example by truncating
second chunk of fuzzy hashes.



[PROMPT_EXAMPLE_BEGIN]

$ # Generate artificial test cases
$ cat >test <<_END
ssdeep,1.1--blocksize:hash:hash,filename
24:5nmkHuww9FXe0ZpPKoVH7bK3KT1Odk8gKgNWvoqzDVEatXSHlY31x:E4uV9FX,"1"
24:5nmkHuww9FXe0ZpPKoVH7bK3KT1Odk8gKgNWvoqzDVENXSCYA1x:E4uV9FX,"2"
_END

$ # This is the expected result.
$ $SSDEEP_FIXED/ssdeep -k test -x test
test:1 matches test:2 (100)
test:1 matches test:2 (100)

test:2 matches test:1 (100)
test:2 matches test:1 (100)

test:1 matches test:2 (100)
test:1 matches test:2 (100)

test:2 matches test:1 (100)
test:2 matches test:1 (100)

$ # This is the result from Debian versions of ssdeep.
$ ssdeep -k test -x test
test:1 matches test:2 (94)
test:1 matches test:2 (94)

test:2 matches test:1 (94)
test:2 matches test:1 (94)

test:1 matches test:2 (94)
test:1 matches test:2 (94)

test:2 matches test:1 (94)
test:2 matches test:1 (94)

$
[PROMPT_EXAMPLE_END]



As you can see, buggy ssdeep/libfuzzy2 returns score of 94 but fixed
versions of ssdeep/libfuzzy2 returns score of 100 for cases:

* file 1 and file 2
* file 1 and file 1 (matching itself)
* file 2 and file 2 (matching itself)

Attached patch is excerpt from actual Jesse Kornblum's patch (applied in
ssdeep 2.10) formatted for Debian version of 2.7-2.


By the way, I recommend UPGRADING THE UPSTREAM VERSION TO 2.10 on
`unstable' and `sid' instead of applying the patch because ssdeep
version 2.10 fixes some other bugs (I didn't encountered but someone
other may).

Thanks and I hope this will be fixed before `Jessie' is frozen.

Tsukasa OI
http://a4lg.com/


-- System Information:
Debian Release: 7.6
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-4-amd64 (SMP w/40 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages ssdeep depends on:
ii  libc6  2.13-38+deb7u4

ssdeep recommends no packages.

ssdeep suggests no packages.

-- no debconf information

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fuzzy-patch-2.10-by-Jesse-Kornblum.patch
Type: text/x-diff
Size: 452 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/forensics-devel/attachments/20140908/374c34cf/attachment.patch>


More information about the forensics-devel mailing list