Trove vocabulary

era+debian@iki.fi era+debian@iki.fi
09 Dec 2003 08:29:17 +0200


On Tue, 9 Dec 2003 00:25:54 +0100, Erich Schubert <erich@debian.org>
posted to the deb-usability list:
 > The biggest point i learned:
 > you cannot easily add new tags later on. You have to do the vocabulary
 > correct at the start, or any change will reduce the quality, not improve
 > it, because the changes will take a long time until they have been
 > applied to all data.

My experience from a quite different domain (tagging of linguistic
data) is that you are +going+ to want to adapt the tag set as your
domain knowledge grows. Having an infrastructure to support that is
challenging, though. What I've tried (but never really managed to work
out on a scale where I would regard it as proven to be workable) is to
keep around enough metadata to have an audit trail etc. for each entry
you have tagged. Who tagged it, when, according to what standard. Then
at least you will know which entries still need to be reviewed when
you are in the middle of a major transition.

What's harder is when transitions are minor and happen over time, in
your head. Slightly unclear or ambiguous specifications will lead to
bad quality in the tagging. The cases of "printing" and "ftp" which
were brought up here are good examples -- your average tagging droid
is not going to look at the spec all the time, just go by what the
category labels seem to imply. So you absolutely need the labels to be
very clear.

I don't think for these two examples that this is very challenging --
just try "ftp clients" and "printer utilities", maybe with spaces
replaced with Computer Punctuation if that's more convenient :-) --
but things get interesting when you have a category which unto itself
is controversial or just obscure (stuff like "ham" and "x.25" and
"k12" come to mind; I know vaguely what they mean but only enough to
not look in those directories on an ftp server or a software catalog
for +anything+ that would be of any use to me :-)

Paradoxically, if you manage to tag the Debian archive by more
stringent standards than any other Trove directory, then in practice
you are not Trove-compatible anymore ;^)

Anyway, using Trove as a baseline and striving to improve Trove as a
whole when it's wrong or blunt or inflexible sounds like the way to
go.

Have you been exploring the various semantic web efforts out there?
I can't say I understand what that stuff is about but it would seem to
have a bearing on what you are doing.

/* era */

Am I the only one who thinks "usability" is completely the wrong term
for what you are doing? Sure, it might improve usability a bit, but
it's not exactly like this is the biggest usability problem in Debian,
or that the focus of the discussion would be anywhere near what
usability folks are usually talking about.

-- 
formail -s procmail <http://www.iki.fi/era/spam/ >http://www.euro.cauce.org/
cat | more | cat<http://www.iki.fi/era/unix/award.html>http://www.debian.org/