[Debtags-devel] A first use of the bayesian tagger

Enrico Zini enrico at enricozini.org
Wed Oct 26 21:13:46 UTC 2005


On Tue, Oct 25, 2005 at 05:15:23PM -0400, Benjamin Mesing wrote:

> Good we speak a common language: Perl....

Yup! :)

> You might consider reviewing the tags the tagger disagrees with too.
> There was a human being tagging in the first place, and he/she might
> have more insight into the package under review than the AI-tagger.

I have some updates.  The outcome of the experiment wasn't too bad, but
the tagger was a bit too slow to try it heavily tweaking the options.
Testing took roughly about 2 seconds per package.

While I was waiting, I have a look around and found a new entry called
"dbacl", which does (wow!) bayesian categorization.  So I also gave a
try to that one.

I committed the testing scripts I've made:

 svn+ssh://svn.debian.org/svn/debtags/autodebtag/trunk/dbacl/mkpkgs
   Creates a directory with package information, one file per package.
   This can be used as a token cache to train dbacl faster.
   
 svn+ssh://svn.debian.org/svn/debtags/autodebtag/trunk/dbacl/train
   This creates training data for dbacl for one tag.

 svn+ssh://svn.debian.org/svn/debtags/autodebtag/trunk/dbacl/test
   This asks dbacl if it thinks that package belongs to that tag.

One could do more things, like giving dbacl a set of categories (all
tags, even) and ask it in which ones it thinks the package fits.  (WOW)

Mornfall did a bit of experiments on this front as well.

It's all quite exciting!


Ciao,

Enrico

--
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico at enricozini.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/debtags-devel/attachments/20051026/f7626bbc/attachment.pgp


More information about the Debtags-devel mailing list