vocabulary structure [user interfaces]

Wed Jun 28 14:05:13 UTC 2006

On Wed, Jun 28, 2006 at 03:33:30PM +0200, Benjamin Mesing wrote:
> So it seems like the two big challenges for the "debtags team" are:
>      1. Develop a good vocabulary.
>      2. Find approaches to reduce the complexity in user interfaces.
Both are being worked on. First one right here. To sum up the second, there
are ways to algorithmically compute a 'fairly good' candidate set to be
presented to the user. More detailed explanation of two approaches to do that
is below.

> I think we should always remember this and keep those two things
> separate in our minds.
Yes, this is a very important point.

> However, we also need to keep the vocabulary at
> a reasonable complexity.
And another important point, which i agree with.

When we have a vocabulary of reasonable complexity which is expressive enough
for our needs (a state we are approaching, i believe), we can work a bit on
the user interface point of view. It is not too hard to algorithmically pick
tags that are relevant for current search. You can obviously leave out
everything that is not present in current result set. You can also leave out
things that only have negligible effect on the result set.

Basically, for the current search, the importance of a tag is related to how
close it is to 50% of packages. Probably anything between 20 and 80% is a good
candidate. That means, you can omit anything that is only present on very few
packages or that's on "almost all" packages (because it probably doesn't
refine the search result anymore).

This approach has a good potential to cut off whole facets from the interface
that are not relevant for a given search. A good start, if you ask me.

A slightly different approach (however based on same idea) is to find sets of
tags that partition the archive well. This probably boils down to finding the
right facets which cover most of the archive and also split packages into
distinct groups. The use, role, works-with and interface facets come to mind.

The second solution is harder on the algorithmisation part, since we want to
be able to find these facets algorithmically for any result set, so we can
keep a relevant set of facets to pick from.

Since first solution works on tags and the other on facets, they can be
combined fairly well. The second can be used to find the good candidate facets
to present, after which, the first one can be used to cut down the amount of
tags in each of the presented facets to acceptable amount.

Yours, Peter.

PS: I am not addressing the tagging part of the interface right now. I'll try
to find some time to write up another mail on that, since i have some ideas
there too.

-- 
Peter Rockai | me()mornfall!net | prockai()redhat!com | +421907533216 
   http://blog.mornfall.net | http://web.mornfall.net

"In My Egotistical Opinion, most people's C programs should be
 indented six feet downward and covered with dirt."
     -- Blair P. Houghton on the subject of C program indentation