Re: [R] text mining

From: Duncan Murdoch <murdoch.duncan_at_gmail.com>
Date: Mon, 30 May 2011 08:28:04 -0400

On 30/05/2011 6:17 AM, rgui wrote:
> Hi,
>
> I have a problem when indexing the corpus. I used the following syntax:
>
> > Setwd ("c :/....")
> > Library (tm)
> > Txt = Corpus (DirSource ("."); readerControl = list (language = "frensh"))
>
Capitalization is important in R, so when asking a question, please cut and paste what you actually did. In this case, it doesn't matter.

> an error message comes:
>
> >>> Messages d'avis :
> 1: In readLines(y, encoding = x$Encoding) :
> ligne finale incomplète trouvée dans './n3.txt'
> 2: In readLines(y, encoding = x$Encoding) :
> ligne finale incomplète trouvée dans './n32.

Those are warnings, not errors. readLines gives those warnings when the last line of the file stops abruptly, rather than having an end of line marker. On Unix systems this usually signals a problem with the file. Windows is more tolerant, so many editors don't bother to add the final marker.
> another question:
> how can I read different document types (. pdf,. "...) html using the
> package "tm"?

I think you need to convert them to text first (by some tool outside of R), but I might be wrong.

Duncan Murdoch

> Thanks very well for help
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/text-mining-tp3560367p3560367.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 30 May 2011 - 12:32:37 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 30 May 2011 - 13:00:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive