vocabulary structure [coarseness]

Wed Jun 28 11:21:28 UTC 2006

On Tue, Jun 27, 2006 at 09:57:39AM +0200, Peter Rockai wrote:

Reply to the second issue I found.

> As for rest of made-of, there's only data:*, so made-of::data and format::
> (File Format) facet would be probably a good idea again. There are tags like
> role::content:data and i would vote for
> works-with::{audio,video,text,image,database,archive,font,...} all of which
> would hint the package could be tagged with a format:: tag as well (not
> always, but it would often make sense).

Here, as I understand it, the problem you raise is the level of
coarseness of tags in a facet: some facets have coarse tags, some facets
have very detailed tags, some facets have a mix of both.

Example:
  Enrico wants an image viewer, and there are as much as 6 different
  tags that are relevant to such a simple question:
    works-with::image
    works-with::image:raster
    works-with::image:raster:jpg
    works-with::image:raster:png
    works-with::image:vector
    works-with::image:vector:svg
  6 is near to the cognitive limit of 7 +/- 2 and Enrico gets confused.

This is a nasty but important point.  Should tags in a facet be
omogeneous with regards to the level of coarseness?  If yes, how do we
separate fine-grained from coarse grained tags?  And how do we handle
the in-between cases?

I think we're reasoning interface-wise rather than classification-wise:
once a tag is well-defined, it doesn't matter too much were it is filed.
Of course, filing a tag under the right facet contributes to the
well-definedness of the tag.

The current approach is to use grouping to represent different levels of
detail.  If I run:

  debtags tagsearch works-with | grep -v '[a-z]:[a-z]'

Then I get a coarse classification, while if I run:

  debtags tagsearch works-with::image

then I get a more detailed group of tags related to images.

I could see ways of operating this distinction automatically at an
interface level using the current vocabulary structure.  I also wouldn't
mind restructuring the vocabulary using a different approach to
coarseness like:

  Facet: works-with
  Coarseness: broad

  Facet: kind-of-image-format
  Coarseness: detailed

and then come out with algorithms to hide tags from more detailed facets
unless they become relevant to the current search.

I don't know which of the two ways is easier, though :)

Ciao,

Enrico

-- 
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico at debian.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/debtags-devel/attachments/20060628/ebe745cf/attachment.pgp