[R] DocumentTermMatrix error

From: Matev¾ Pavlič <matevz.pavlic_at_gi-zrmk.si>
Date: Sat, 21 May 2011 13:26:40 +0200


Hi all,  

I have tried to create a DocumentTermMatrix with a tm package, but i get this error :  

Error in tolower(txt) :

  invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs'  

I tried doing this as it is showed in :

http://www.r-project.org/doc/Rnews/Rnews_2008-2.pdf (An Introduction to Text Mining),  

with this R code :  

setwd("C:/Users/mpavlic/Desktop/temp")

tekst <- Corpus(DirSource("."))

>Warning message:

>In readLines(y, encoding = x$Encoding) :

>incomplete final line found on './test.txt'
 

meta(tekst, "Heading", "local") <- c("test")

meta(tekst[[1]])

>Available meta data pairs are:

  Author :

   DateTimeStamp: 2011-05-21 11:25:21

   Description :

   Heading : test

  ID : test.txt

  Language : en

  Origin :  

test <- TermDocumentMatrix(tekst)

> Error in tolower(txt) :

> invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs'
   

Attached is a small sample (test.txt) on which i worked.  

Any help would be appreaciated,

m    



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Sat 21 May 2011 - 11:32:39 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 21 May 2011 - 12:00:08 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive