Announcing Debian Package Tags
Javier Fernández-Sanguino Peña
jfs@computer.org
Tue, 29 Apr 2003 12:45:12 +0200
--jRHKVT23PllUwdXP
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Tue, Apr 29, 2003 at 12:38:18PM +0200, Enrico Zini wrote:
> On Mon, Apr 28, 2003 at 03:23:24PM +0200, Javier Fern=E1ndez-Sanguino Pe=
=F1a wrote:
>=20
> > [1] One of the difficult things in the future might be to generate new =
tags
> > or associate new packages to tags already available. Automatising (sp?)=
=20
> > this would be useful and TFIDF (and similar IA-related techniques) help
> > with this quite a lot.
>=20
> I've never tried the tools you are suggesting me, but I definitely will.
> I have some immediate additions to make to tagcoll and debtags based on
> the many suggestions I have received.
>=20
Notice that rainbow (libbow) is not being actively updated upstream=20
anymore. The library works, the tool works, but some documentation is still=
=20
lacking.
> You're showing me a whole new world to explore, and I'll be sure do it
> asap.
Glad to help.
Just FYI TFIDF is a very simple "technology" (as a matter of fact it's just
an equation) used to determine the 'weight' of words given a liberal text.=
=20
It's useful for document clustering (because you can determine documents
belong to the same 'group' if they have the same word weights).
The application I found, in Debian, worth testing (which prompted me to=20
develop the hack that 'dpkg-iasearch' is) is to use document clustering and=
=20
TFIDF to find packages.
If you have a set of packages descriptions (let's say 4000) you can parse=
=20
all the words in the descriptions, compare all the words in all the=20
descriptions and determine which words are 'appropiate' to describe a given=
=20
package.=20
As a matter of fact, naturally, this same words are keywords that can=20
describe a set of related packages and thus, this approach could be useful=
=20
to automaticly tags new packages when they get into Debian.
Just my 2c.
Regards
Javi
--jRHKVT23PllUwdXP
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
iD8DBQE+rle4sandgtyBSwkRAoFCAJ9baPPtSmNdELgrvuYUkxTvSGnTHQCfXD6d
Ah4w82y9yB1WijOxNoOjnYs=
=BagD
-----END PGP SIGNATURE-----
--jRHKVT23PllUwdXP--