Re: [R] SVD Memory Issue

From: Paul Hiemstra <paul.hiemstra_at_knmi.nl>
Date: Wed, 14 Sep 2011 07:31:38 +0000

 Hi,

An SVD on a 771x5677 matrix should be fine, it took 30 seconds and no memory on my workstation. The problem is most likely when you transform the array tdm2 to a matrix. The array tdm2 has a much greater size than 771x5677, so does tdm_matrix. Without a reproducible example we cannot help you very well. Furthermore, I have no clue as to what needs to be extracted from tdm2 as input for the svd because I have no experience with the tm-package.

good luck,
Paul

On 09/13/2011 10:24 AM, vioravis wrote:
> I am trying to perform Singular Value Decomposition (SVD) on a Term Document
> Matrix I created using the 'tm' package. Eventually I want to do a Latent
> Semantic Analysis (LSA).
>
> There are 5677 documents with 771 terms (the DTM is 771 x 5677). When I try
> to do the SVD, it runs out of memory. I am using a 12GB Dual core Machine
> with Windows XP and don't think I can increase the memory anymore. Are there
> any other memory efficient methods to find the SVD?
>
> The term document is obtained using:
>
> tdm2 <-
> TermDocumentMatrix(tr1,control=list(weighting=weightTf,minWordLength=3))
> str(tdm2)
>
> List of 6
> $ i : int [1:6438] 202 729 737 278 402 621 654 718 157 380 ...
> $ j : int [1:6438] 1 2 3 7 7 7 7 8 10 10 ...
> $ v : num [1:6438] 8 5 6 9 5 7 5 6 5 7 ...
> $ nrow : int 771
> $ ncol : int 5677
> $ dimnames:List of 2
> ..$ Terms: chr [1:771] "access" "accessori" "accumul" "acoust" ...
> ..$ Docs : chr [1:5677] "1" "2" "3" "4" ...
> - attr(*, "class")= chr [1:2] "TermDocumentMatrix" "simple_triplet_matrix"
> - attr(*, "Weighting")= chr [1:2] "term frequency" "tf"
>
> SVD is calcualted using:
>
>> tdm_matrix <- as.matrix(tdm2)
>> svd_out<-svd(tdm_matrix)
> Error: cannot allocate vector of size 767.7 Mb
> In addition: Warning messages:
> 1: In matrix(0, n, np) :
> Reached total allocation of 3583Mb: see help(memory.size)
> 2: In matrix(0, n, np) :
> Reached total allocation of 3583Mb: see help(memory.size)
> 3: In matrix(0, n, np) :
> Reached total allocation of 3583Mb: see help(memory.size)
> 4: In matrix(0, n, np) :
> Reached total allocation of 3583Mb: see help(memory.size)
>
>
> Thank you.
>
> Ravi
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/SVD-Memory-Issue-tp3809667p3809667.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Paul Hiemstra, Ph.D.
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 14 Sep 2011 - 07:35:14 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 14 Sep 2011 - 08:40:21 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive