Announcing Debian Package Tags

Javier Fernández-Sanguino Peña jfs@computer.org
Tue, 29 Apr 2003 12:45:12 +0200


--jRHKVT23PllUwdXP
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Apr 29, 2003 at 12:38:18PM +0200, Enrico Zini wrote:
> On Mon, Apr 28, 2003 at 03:23:24PM +0200, Javier Fern=E1ndez-Sanguino Pe=
=F1a wrote:
>=20
> > [1] One of the difficult things in the future might be to generate new =
tags
> > or associate new packages to tags already available. Automatising (sp?)=
=20
> > this would be useful and TFIDF (and similar IA-related techniques) help
> > with this quite a lot.
>=20
> I've never tried the tools you are suggesting me, but I definitely will.
> I have some immediate additions to make to tagcoll and debtags based on
> the many suggestions I have received.
>=20

Notice that rainbow (libbow) is not being actively updated upstream=20
anymore. The library works, the tool works, but some documentation is still=
=20
lacking.


> You're showing me a whole new world to explore, and I'll be sure do it
> asap.

Glad to help.

Just FYI TFIDF is a very simple "technology" (as a matter of fact it's just
an equation) used to determine the 'weight' of words given a liberal text.=
=20
It's useful for document clustering (because you can determine documents
belong to the same 'group' if they have the same word weights).

The application I found, in Debian, worth testing (which prompted me to=20
develop the hack that 'dpkg-iasearch' is) is to use document clustering and=
=20
TFIDF to find packages.

If you have a set of packages descriptions (let's say 4000) you can parse=
=20
all the words in the descriptions, compare all the words in all the=20
descriptions and determine which words are 'appropiate' to describe a given=
=20
package.=20

As a matter of fact, naturally, this same words are keywords that can=20
describe a set of related packages and thus, this approach could be useful=
=20
to automaticly tags new packages when they get into Debian.

Just my 2c.

Regards

Javi


--jRHKVT23PllUwdXP
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+rle4sandgtyBSwkRAoFCAJ9baPPtSmNdELgrvuYUkxTvSGnTHQCfXD6d
Ah4w82y9yB1WijOxNoOjnYs=
=BagD
-----END PGP SIGNATURE-----

--jRHKVT23PllUwdXP--