Bill Allombert Bill.Allombert at math.u-bordeaux1.fr
Tue Feb 23 16:03:58 UTC 2010

On Sat, Feb 20, 2010 at 03:15:07PM +0100, Vincent Fourmond wrote:
> Package: popularity-contest
> Version: 1.48
> Severity: wishlist
>   Hello,
>   Often, I would be interested to know more than just "how many
> percents of the people have this package ?": it would be great if one
> could have more information, such as correlations "how many people
> have this and that packages ?" installed at the same time, or "how many
> people still use the buggy 1.0.1-2 version of this software ?".

Hello Vincent,

This has been discussed a lot of time, but we cannot provide such data
because that would break popcon submitters privacy expectations.

>   You definitely have the information somewhere (except for the
> version information, it seems, but it wouldn't be too difficult to
> get). The question then is how to store/disclose this information,
> without losing anonimity.
>   Maybe it would be interesting to publish the raw emails (without the
> mail envelope, of course), or would that be too big ? (around 100k *
> 90 000 submitters is one gigabyte, but I guess it should compress
> really well). Other formats could make it much more compact.
>   My guess is that using fully this data would enable us to know much
> more than just "which package is the most popular ?".

Certainly. For example, you could reach conclusion like 'every submitter
that use foo, bar, and baz also use wilma and fred'. Unfortunately this
is a major provacy issue: if you guess that a popcon submitter is using 
foo, bar, and baz (because the submitter run a web service that use
foo, bar and baz, because the submitter is the maintainer of foo, bar, and baz,
etc.) you can conclude that the submitter is also running wilma and fred,
which break the privacy expectation.

Bill. <ballombe at debian.org>

