[Rd] tm and e1071 question

From: Jeszenszky Peter <jeszenszky.peter_at_inf.unideb.hu>
Date: Fri, 04 Dec 2009 13:24:43 +0100

Dear Developers,

I would like to use the svm function of the e1071 package for text classification tasks. Preprocessing can be carried out by using the excellent tm text mining package.

TermDocumentMatrix and DocumentTermMatrix objects of the package tm are currently implemented based on the sparse matrix data structures provided by the slam package.

Unfortunately, the svm function of the e1071 package accepts only sparse matrices of class Matrix provided by the Matrix package, or of class matrix.csr as provided by the package SparseM.

In order to train an SVM with a DocumentTermMatrix object the latter must be converted to a matrix.csr sparse matrix structure. However, none of the publicly available packages of CRAN provides such a conversion function. It is quite straightforward to write the conversion function, but it would be much confortable to pass slam sparse matrix objects directly to the svm function.

Do you plan to add slam sparse matrix support to the e1071 package?

Best regards,

Peter Jeszenszky



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 04 Dec 2009 - 12:35:20 GMT

This archive was generated by hypermail 2.2.0 : Sat 05 Dec 2009 - 16:31:01 GMT