[Debtags-devel] A first use of the bayesian tagger
enrico at enricozini.org
Wed Oct 26 21:13:46 UTC 2005
On Tue, Oct 25, 2005 at 05:15:23PM -0400, Benjamin Mesing wrote:
> Good we speak a common language: Perl....
> You might consider reviewing the tags the tagger disagrees with too.
> There was a human being tagging in the first place, and he/she might
> have more insight into the package under review than the AI-tagger.
I have some updates. The outcome of the experiment wasn't too bad, but
the tagger was a bit too slow to try it heavily tweaking the options.
Testing took roughly about 2 seconds per package.
While I was waiting, I have a look around and found a new entry called
"dbacl", which does (wow!) bayesian categorization. So I also gave a
try to that one.
I committed the testing scripts I've made:
Creates a directory with package information, one file per package.
This can be used as a token cache to train dbacl faster.
This creates training data for dbacl for one tag.
This asks dbacl if it thinks that package belongs to that tag.
One could do more things, like giving dbacl a set of categories (all
tags, even) and ask it in which ones it thinks the package fits. (WOW)
Mornfall did a bit of experiments on this front as well.
It's all quite exciting!
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico at enricozini.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/debtags-devel/attachments/20051026/f7626bbc/attachment.pgp
More information about the Debtags-devel