Re: [R] reading data from a pdf

From: Jean Eid <jeaneid_at_chass.utoronto.ca>
Date: Tue 25 Oct 2005 - 01:04:07 EST

Hi,

In my experience pdftotext did not do a very good job at this because it screws up the formatting of tables. This of course depends on what program the pdf document was originally constructed with. What I found most appealing is the use of cut and paste into xemacs or emacs and use M-x canonically-space-region function. This will eliminate any extra spaces. However if the pdf document was prepared through scanning and one uses a character recognition program, then all is up in the air and the formatting of tables have to be done by hand.

Jean
rambam@bigpond.net.au wrote:

>>Hi, I'm trying to read data from a PDF file.Is it possible to do it
>>with R? Thanks, Marco
>>
>>
>
>If cut and paste to a text file fails, try this:
>
>pdftotext (from the xpdf project)
>
>or
>
>http://pdftohtml.sourceforge.net
>pdftohtml is a utility which converts PDF files into HTML and
>XML formats
>
>In addition, pdftk, the command line pdf toolkit may be useful
>http://www.accesspdf.com/pdftk/
>
>
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Oct 25 01:13:53 2005

This archive was generated by hypermail 2.1.8 : Tue 25 Oct 2005 - 03:14:40 EST