RE: [R] TD Matrix

From: Huntsinger, Reid <>
Date: Fri 18 Mar 2005 - 12:11:59 EST

Do you mean when you encounter a new term? I would think document *length* wouldn't matter; presumably you have a list of terms already. If so you could treat each document as a vector of term codes, then use "tabulate" to get the column for that document.

If you're using all terms that appear in any document, and you don't want to compile a list of terms first, then you might want to think of creating a sparse representation as in the sparseM package and using the sparse linear algebra routines there. Just an idea, though.

Reid Huntsinger

-----Original Message-----
[] On Behalf Of Ryan Steckel Sent: Thursday, March 17, 2005 6:01 PM
Subject: [R] TD Matrix

I'm trying to create a term document matrix where the columns are the documents, the rows are the terms in the documents, and the cells are a weight of term frequency in the document. My problem is the documents are all different lengths. So when I add a new document, if the document length is greater than the max document length in the matrix, I have to resize the matrix and do a cbind operation.  

Does anyone know of an easier way? mailing list PLEASE do read the posting guide! mailing list PLEASE do read the posting guide! Received on Fri Mar 18 12:19:06 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:30:51 EST