[R] Solution to: Error "... x must be atomic" when using lsa (latent semantic analysis) package

From: Alex McKenzie <ahmckenzie_at_gmail.com>
Date: Tue, 25 Mar 2008 12:50:42 -0400

In case someone else runs into this, I found the problem, it was related to having some zero-length text files. Make sure you have valid (non-empty) data files for loading into the document-term matrix.



I'm trying to use the "lsa" (latent semantic analysis) package, and running into a problem that seems to be related to the number of documents being processed. Here's the code I'm running (after loading the lsa and rstem packages), and the error message:

> SnippetsPath <- "c:\\OED\\AuditExplain\\" # path where to find text
> data(stopwords_en)
> tdm <- textmatrix(SnippetsPath, stopwords=stopwords_en)

I get this error message with ~ 280 documents: "Error in sort( unique.default(x), na.last = TRUE) : 'x' must be atomic"

The error won't occur if I reduce the number of documents (say to 220, for instance). I'm not clear if this is memory/capacity issue or something else.
A traceback returns the following, but interpreting this result is outside of my league ;-) Any idea of what could be the problem? I greatly appreciate your advice.

> traceback()

10: stop("'x' must be atomic")

9: sort(unique.default(x), na.last = TRUE)
8: factor(a, exclude = exclude)
7: table(txt)
6: inherits(x, "factor")
5: is.factor(x)
4: sort(table(txt), decreasing = TRUE)
3: FUN(X[[238]], ...)
2: lapply(dir(mydir, full.names = TRUE), textvector, stemming, language,
       minWordLength, minDocFreq, stopwords, vocabulary)
1: textmatrix(SnippetsPath, stopwords = stopwords_en)


        [[alternative HTML version deleted]]

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 25 Mar 2008 - 18:19:02 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 25 Mar 2008 - 18:30:23 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive