[Popcon-developers] Bug#570650: popcon.debian.org: please provide more granularity on the information

Vincent Fourmond fourmond at debian.org
Thu Mar 18 21:48:46 UTC 2010


retitle 570650 popcon.debian.org: please provide raw popcon time data
thanks

  Hello,

Bill Allombert wrote:
>>   Often, I would be interested to know more than just "how many
>> percents of the people have this package ?": it would be great if one
>> could have more information, such as correlations "how many people
>> have this and that packages ?" installed at the same time, or "how many
>> people still use the buggy 1.0.1-2 version of this software ?".
> 
> Hello Vincent,
> 
> This has been discussed a lot of time, but we cannot provide such data
> because that would break popcon submitters privacy expectations.

  This is exactly what I was afraid you would answer ;-)...

>>   You definitely have the information somewhere (except for the
>> version information, it seems, but it wouldn't be too difficult to
>> get). The question then is how to store/disclose this information,
>> without losing anonimity.
>>
>>   Maybe it would be interesting to publish the raw emails (without the
>> mail envelope, of course), or would that be too big ? (around 100k *
>> 90 000 submitters is one gigabyte, but I guess it should compress
>> really well). Other formats could make it much more compact.
>>
>>   My guess is that using fully this data would enable us to know much
>> more than just "which package is the most popular ?".
> 
> Certainly. For example, you could reach conclusion like 'every submitter
> that use foo, bar, and baz also use wilma and fred'. Unfortunately this
> is a major provacy issue: if you guess that a popcon submitter is using 
> foo, bar, and baz (because the submitter run a web service that use
> foo, bar and baz, because the submitter is the maintainer of foo, bar, and baz,
> etc.) you can conclude that the submitter is also running wilma and fred,
> which break the privacy expectation.

  Although for the latter case, you're likely to pick up dependencies or
otherwise related packages, so it won't provide too much information in
the end.

  I think I have a proposition that would allow people like me to do
more quantitative analysis of popcon data while not requiring more than
what is publicly available now: could you please provide somewhere the
raw time data for the popcon graphs, ie, the raw date/popcon series for
each package ? This would allow one to manipulate the data, compute
correlation between different packages (though with less certainty than
the raw popcon data, which is good too for the privacy of our users),
correlate with periods of open bugs...

  What do you think about this ?

  Cheers,

	Vincent

-- 
Vincent Fourmond, Debian Developer
http://vince-debian.blogspot.com/

Some pirates achieved immortality by great deeds of cruelty
and derring-do. Some achieved immortality by amassing great
wealth. But the captain had long ago decided that he would,
on the whole, prefer to achieve immortality by not dying.
 -- Terry Pratchet, the Colour of Magic

Vincent, listening to Im Still Remembering (The Cranberries)





More information about the Popcon-developers mailing list