Re: [Rd] Válasz: Re: tm and e1071 question

From: Martin Maechler <maechler_at_stat.math.ethz.ch>
Date: Wed, 09 Dec 2009 09:40:41 +0100

>>>>> "JP" == Jeszenszky Peter <jeszenszky.peter_at_inf.unideb.hu> >>>>> on Mon, 7 Dec 2009 22:12:43 +0100 writes:

    JP> Hello,
    JP> Thank you for your reply. The suggested conversion trick with a slight
    JP> modification does the job.

    JP> I hope, the svm function of the e1071 package will support slam sparse     JP> matrices directly. I think that this would be quite a reasonable feature.

I strongly disagree.

'Matrix' is a recommended package and most feature-complete for sparse matrices and their arithmetic. While it is known that parts of its functionality could and maybe should be rendered to work more efficiently,
it is very reasonable and sensible that other sparse matrix formats --- if needed at all (they may make sense in a limited context) --- have utilities to convert from and to the "sparseMatrix" (sub)classes in 'Matrix'.

Ingo provided code to do exactly that.

Regards,
Martin Maechler, ETH Zurich

    JP> Furthermore, there are developers who participate in the development of     JP> both the slam and the e1071 packages.

    JP> Best regards,

    JP> Peter Jeszenszky

    JP> -----Ingo Feinerer <feinerer_at_logic.at> ezt írta: -----

    JP> Címzett: r-devel_at_r-project.org
    JP> Feladó: Ingo Feinerer <feinerer_at_logic.at>
    JP> Dátum: 2009/12/05 10:43de.
    JP> Másolat: Jeszenszky Peter <jeszenszky.peter_at_inf.unideb.hu>
    JP> Tárgy: Re: tm and e1071 question

    JP> On Fri, Dec 04, 2009 at 02:21:52PM +0100, Achim Zeileis wrote:
>> I would like to use the svm function of the e1071 package for text
>> classification tasks. Preprocessing can be carried out by using the
>> excellent tm text mining package.

    JP> :-)

>> TermDocumentMatrix and DocumentTermMatrix objects of the package tm
>> are currently implemented based on the sparse matrix data structures
>> provided by the slam package.
>>
>> Unfortunately, the svm function of the e1071 package accepts only sparse
>> matrices of class Matrix provided by the Matrix package, or of class
>> matrix.csr as provided by the package SparseM.
>>
>> In order to train an SVM with a DocumentTermMatrix object the latter
>> must be converted to a matrix.csr sparse matrix structure. However, none
>> of the publicly available packages of CRAN provides such a conversion
>> function. It is quite straightforward to write the conversion function,
>> but it would be much confortable to pass slam sparse matrix objects
>> directly to the svm function.

    JP> You are right. If you have small matrices as(as.matrix(m), "Matrix")
    JP> will work. Then there exists some (non published experimental) code in
    JP> the slam package for conversion to Matrix format (located in
    JP> slam/work/Matrix.R):

    JP> setAs("simple_triplet_matrix", "dgTMatrix",
    JP> function(from) {
    JP> new("dgTMatrix",
    JP> i = as.integer(from$i - 1L),
    JP> j = as.integer(from$j - 1L),
    JP> x = from$v,
    JP> Dim = c(from$nrow, from$ncol),

    JP> Dimnames = from$dimnames)
    JP> })
    JP> setAs("simple_triplet_matrix", "dgCMatrix",
    JP> function(from) {
    JP> ind <- order(from$j, from$i)
    JP> new("dgCMatrix",
    JP> i = from$i[ind] - 1L,
    JP> p = c(0L, cumsum(tabulate(from$j[ind], from$ncol))),
    JP> x = from$v[ind],
    JP> Dim = c(from$nrow, from$ncol),
    JP> Dimnames = from$dimnames)
    JP> })

    JP> which allows then:

    JP> class(m) <- "simple_triplet_matrix"
    JP> as(m, "dgTMatrix")
    JP> as(m, "dgCMatrix")

>> Do you plan to add slam sparse matrix support to the e1071 package?

    JP> I cannot answer this since I am neither directly involved in the e1071     JP> nor in the slam package.

    JP> Best regards, Ingo Feinerer

    JP> --
    JP> Ingo Feinerer
    JP> Vienna University of Technology
    JP> http://www.dbai.tuwien.ac.at/staff/feinerer

    JP> ______________________________________________
    JP> R-devel_at_r-project.org mailing list     JP> https://stat.ethz.ch/mailman/listinfo/r-devel

R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed 09 Dec 2009 - 08:44:08 GMT

This archive was generated by hypermail 2.2.0 : Wed 09 Dec 2009 - 11:41:04 GMT