RE: [R] TD Matrix

From: Huntsinger, Reid <reid_huntsinger_at_merck.com>
Date: Fri 18 Mar 2005 - 12:11:59 EST


Do you mean when you encounter a new term? I would think document *length* wouldn't matter; presumably you have a list of terms already. If so you could treat each document as a vector of term codes, then use "tabulate" to get the column for that document.

If you're using all terms that appear in any document, and you don't want to compile a list of terms first, then you might want to think of creating a sparse representation as in the sparseM package and using the sparse linear algebra routines there. Just an idea, though.

Reid Huntsinger

-----Original Message-----
From: r-help-bounces@stat.math.ethz.ch
[mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Ryan Steckel Sent: Thursday, March 17, 2005 6:01 PM
To: r-help@stat.math.ethz.ch
Subject: [R] TD Matrix

I'm trying to create a term document matrix where the columns are the documents, the rows are the terms in the documents, and the cells are a weight of term frequency in the document. My problem is the documents are all different lengths. So when I add a new document, if the document length is greater than the max document length in the matrix, I have to resize the matrix and do a cbind operation.  

Does anyone know of an easier way?



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Mar 18 12:19:06 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:30:51 EST