[Debtags-devel] A first use of the bayesian tagger

Benjamin Mesing bensmail at gmx.net
Wed Oct 26 23:43:38 UTC 2005


Hello,

dbacl seems really nice. It is also pretty fast. Probably it is better
to stay with this tool, instead of using the my AI engine. Especially
because the person writing it, probably knows what he does. However I
think the Tokenizer I wrote could be reused - to create the
<package>.token files. It is most likely smarter than simply taking the
output of "apt-cache dumpavail". Even though I hate all those "gut
feelings", I don't have the time to do some real test if it actually is
so. Hanna, whom I hoped might have some time to do this (and some more
knowledge), did not deign to respond any of the mails I sent to the list
or directly to her.


> I have some updates.  The outcome of the experiment wasn't too bad, but
> the tagger was a bit too slow to try it heavily tweaking the options.
> Testing took roughly about 2 seconds per package.
Well that is if you launch the script once for each package. If you use
the Perl classes directly without rereading the package descriptions for
each file, and also hold the classes for the tags in the memory, this
should be not that bad. Probably not as fast as aocl either.

> One could do more things, like giving dbacl a set of categories (all
> tags, even) and ask it in which ones it thinks the package fits.  (WOW)
As dbacl seems much more mature and sophisticated than my AI-engine, I
suggest to concentrate on this tool.
However some of the lessons I've learned with the AI tagger might come
in handy for dbacl too. One is that the a smart tokenizer increases the
precision. Another is that a training ratio of 2:1 (bad:good) might
increase the precision too. But that may also depend on the underlying
bayesian implementation.
Perhaps the framework of the bayesian-tagger perlscript can be also used
to launch the dbacl -- but probably Enrico is faster hacking some bash
scripts...
/me was impressed by what Enrico did with bash together with pipes and
all those neat little tools. Perhaps I should learn bash and some more
Linux commands after 5 years of excessivly using Linux.

> Mornfall did a bit of experiments on this front as well.
What do you do?

> It's all quite exciting!
That it is indeed!

Greetings Ben





More information about the Debtags-devel mailing list