[Debtags-devel] A first use of the bayesian tagger
Enrico Zini
enrico at enricozini.org
Wed Oct 26 21:13:46 UTC 2005
On Tue, Oct 25, 2005 at 05:15:23PM -0400, Benjamin Mesing wrote:
> Good we speak a common language: Perl....
Yup! :)
> You might consider reviewing the tags the tagger disagrees with too.
> There was a human being tagging in the first place, and he/she might
> have more insight into the package under review than the AI-tagger.
I have some updates. The outcome of the experiment wasn't too bad, but
the tagger was a bit too slow to try it heavily tweaking the options.
Testing took roughly about 2 seconds per package.
While I was waiting, I have a look around and found a new entry called
"dbacl", which does (wow!) bayesian categorization. So I also gave a
try to that one.
I committed the testing scripts I've made:
svn+ssh://svn.debian.org/svn/debtags/autodebtag/trunk/dbacl/mkpkgs
Creates a directory with package information, one file per package.
This can be used as a token cache to train dbacl faster.
svn+ssh://svn.debian.org/svn/debtags/autodebtag/trunk/dbacl/train
This creates training data for dbacl for one tag.
svn+ssh://svn.debian.org/svn/debtags/autodebtag/trunk/dbacl/test
This asks dbacl if it thinks that package belongs to that tag.
One could do more things, like giving dbacl a set of categories (all
tags, even) and ask it in which ones it thinks the package fits. (WOW)
Mornfall did a bit of experiments on this front as well.
It's all quite exciting!
Ciao,
Enrico
--
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico at enricozini.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/debtags-devel/attachments/20051026/f7626bbc/attachment.pgp
More information about the Debtags-devel
mailing list