[Teammetrics-discuss] Comparison between the old code and the new code.

Sukhbir Singh sukhbir.in at gmail.com
Thu Sep 8 18:29:43 UTC 2011


Hi,

> I found only one author who might be missing in the old stats and others
> listed in your mail are just hidden from the top 10 because of different
> ranking numbers.

I didn't think of that :) But in some cases (I am sorry I don't
remember which), there were some significant postings that were
missing in the old code but there in the new one. Anyways, let's
continue.

> I somehow have the feeling that NNTP stat is lacking some mails which
> might be a bug in the gmane mail fetching algorithm or somewhere else.

Well, there should not be a bug because we are just fetching the
messages form Gmane in a range() loop. The only possible explanation
is that Gmane has deleted some messages due its own spam filter/
implementation perhaps? But whether this results in significant change
in numbers is to be investigated (I doubt it though).

However, the database does seem to be populated with all the messages
and `SELECT COUNT(*)` returns a number that matches the number of
articles in Gmane. So why this is happening, I am not sure because the
end result from the database *matches* up to the article count from
Gmane. And I have verified this IIRC and I will do it again.

> The situation is way better if we have real mailboxes from alioth.  While my
> offline data from old liststats code is lacking the infomation from August I
> can observe that the new code has either the same or more mails (and the
> plus of mails somehow fits what I would expect for one month).  So I think
> the mbox parsing code is perfectly fine and so my hope is that we finally

That's good :)

> will increase the quality of our obsevation once we get straight access to
> the mboxes.

Heh, when will that happen is only a guess!

How do you recommend I test NNTPstat to find out where the problem is?
Because the only thing that is bothering me is that if we _indeed_
miss messages, the database should reflect this, which it doesn't.



More information about the Teammetrics-discuss mailing list