[Debtags-commits] [svn] r1354 - tagcoll/trunk

Enrico Zini enrico at costa.debian.org
Thu Sep 15 09:26:09 UTC 2005


Author: enrico
Date: Thu Sep 15 09:26:08 2005
New Revision: 1354

Modified:
   tagcoll/trunk/   (props changed)
   tagcoll/trunk/README
Log:
 r5335 at viaza:  enrico | 2005-09-15 11:25:58 +0200
 Added new idea to the README


Modified: tagcoll/trunk/README
==============================================================================
--- tagcoll/trunk/README	(original)
+++ tagcoll/trunk/README	Thu Sep 15 09:26:08 2005
@@ -160,6 +160,33 @@
 
 These are the TODO-list items currently being worked on::
 
+ - New grouping algorithm:
+    - Define a group cardinality minimum threshold (kind of around 7) and
+      maximum threshold (kind of around 14)
+    - Identify all tagsets with cardinality > maximum threshold, and consider
+      them immutable
+    - Identify all tagsets with cardinality < minimum threshold, and merge them
+      with the nearest tagsets so that the cardinality of the resulting set is
+      still < minimum threshold.  Merge could happen only among tagsets at
+      distance 1, or one could have hints to give weight to tags, and compute a
+      weighted distance that considers the relevance to the user of the various
+      different tags (for some users, a change in implemented-in::* could mean
+      near to nothing).
+    - Use a collection of Hints for having a preference for the tags of the
+      resulting set (for example, implemented-in::* could be less important
+      than use::*) or just use the merge of all tags in the merged tagsets as
+      the resulting tagset, or use the intersection, or handle merged tagsets
+      specially.  The intersection is probably better, especially if weighted
+      distance is used before and then the tags cut out by the intesection
+      would be the less relevant ones already.
+    - If small groups remain which can't be merged because all the nearby
+      groups are big, merge all of them who are smaller than an extra minimum
+      (say 2 or 3) anyway, using the nearest set regardless of its cardinality
+    - Try to run a smart hierarchy on the results
+    - Hints could be a map of Expression -> weight, so that multiple tags can
+      be assigned the same weight (like use::* -> 10, implemented-in::*->1)
+   This should normalise the 'special' items somehow.
+
  - Merge ItemGrouper and TDBIndexer
    
  - Add example code



More information about the Debtags-commits mailing list