[R] [R-pkgs] tm 0.1 uploaded to CRAN

From: Ingo Feinerer <h0125130_at_wu-wien.ac.at>
Date: Thu 11 Jan 2007 - 10:52:23 GMT

Dear useRs,

a first version of tm has just been released on CRAN.

tm provides a sophisticated framework for text mining applications within R.

It offers functionality for managing text documents, abstracts the process of document manipulation and eases the usage of heterogeneous text formats in R. An advanced metadata management is implemented for collections of text documents to alleviate the usage of large and with metadata enriched document sets.

With the package ships native support for handling

*) the Reuters 21578 dataset,
*) the Reuters Corpus Volume 1 dataset,
*) Gmane RSS feeds,
*) e-mails, and
*) several classic file formats (e.g. plain text or CSV text).

tm provides easy access to preprocessing and manipulation mechanisms, like

*) whitespace removal,
*) stemming, or
*) conversion between file formats (e.g., Reuters21578 to plain

Further a generic filter architecture is available in order to

*) filter documents for certain criteria,
*) or perform fulltext search.

The package supports the export from document collections to term-document matrices as frequently used in the text mining literature. This allows the straight-forward integration of existing methods for classification, clustering, visualizations, etc.

The package is designed in a modular way to enable easy integration of new file formats, parsers, transformations and filter operations.

Best regards,

Ingo Feinerer

R-packages mailing list

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri Jan 12 01:41:04 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 11 Jan 2007 - 15:30:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.