Bug#660304: RFP: r-cran-tm -- GNU R package for text mining applications

Rogério Brito rbrito at ime.usp.br
Sat Feb 18 02:04:24 UTC 2012


Package: wnpp
Severity: wishlist

* Package name    : r-cran-tm
  Version         : 0.5-7.1
  Upstream Author : Ingo Feinerer <feinerer at logic.at>
* URL             : http://tm.r-forge.r-project.org/
* License         : GPL-3+
  Programming Lang: R
  Description     : GNU R package for text mining

 The tm package offers functionality for managing text documents, abstracts
 the process of document manipulation and eases the usage of heterogeneous
 text formats in R. The package has integrated database backend support to
 minimize memory demands. An advanced meta data management is implemented for
 collections of text documents to alleviate the usage of large and with meta
 data enriched document sets.
 .
 With the package ships native support for handling the Reuters-21578 data
 set, Gmane RSS feeds, e-mails, and several classic file formats (e.g. plain
 text, CSV text, or PDFs).
 .
 The data structures and algorithms can be extended to fit custom demands,
 since the package is designed in a modular way to enable easy integration of
 new file formats, readers, transformations and filter operations.
 .
 tm provides easy access to preprocessing and manipulation mechanisms such as
 whitespace removal, stemming, or conversion between file formats. Further a
 generic filter architecture is available in order to filter documents for
 certain criteria, or perform full text search. The package supports the
 export from document collections to term-document matrices, and string
 kernels can be easily constructed from text documents.

---

I am in the process of reviewing O'Reilly's book "Machine Learning for
Email".

With the recent uploads of gglib2 and plyr, this is the last package that is
needed for all packages used by the book to be available officially on
Debian (and, I hope, in short time, on popular derivatives like Ubuntu and
Linux Mint).


Regards,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br





More information about the debian-science-maintainers mailing list