[R] Help with cleaning a corpus

From: vintersorg123 <pablo.loyola.h_at_gmail.com>
Date: Mon, 18 Apr 2011 07:00:56 -0700 (PDT)


I created a corpus and I started to clean through this piece of code:

txt <-tm_map(txt,removeWords, stopwords("spanish"))
txt <-tm_map(txt,stripWhitespace)
txt <-tm_map(txt,tolower)
txt <-tm_map(txt,removeNumbers)
txt <-tm_map(txt,removePunctuation)

But something happpended: some of the documents in the corpus became empty, this is a problem when i try to make a document term matrix with tfidf. Is there any way to eliminate automatically a document if it become empty?

Or manually, how could i get the lenght of every document?

hope you can help me! thanks a lot


View this message in context: http://r.789695.n4.nabble.com/Help-with-cleaning-a-corpus-tp3457649p3457649.html
Sent from the R help mailing list archive at Nabble.com.

R-help_at_r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 18 Apr 2011 - 14:12:34 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 18 Apr 2011 - 14:20:31 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive