Re: [R] Reading a web page in pdf format

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Wed, 09 May 2007 13:46:54 -0400

Modify this to suit. After grepping out the correct lines we use strapply to find and emit character sequences that come after a "(" but do not contain a ")" . back = -1 says to only emit the backreferences and not the entire matched expression (which would have included the leading "(" ):

URL <- "http://www.snamretegas.it/italiano/business/gas/bilancio/pdf/bilancio.pdf" Lines.raw <- readLines(URL)
Lines <- grep("Industriale|Termoelettrico", Lines.raw, value = TRUE) library(gsubfn)
strapply(Lines, "[(]([^)]*)", back = -1, simplify = rbind)

which gives a character matrix whose first column is the label and second column is the number in character form. You can then manipulate it as desired.

On 5/9/07, Vittorio <vdemart1_at_tin.it> wrote:
> Each day the daily balance in the following link
>
> http://www.
> snamretegas.it/italiano/business/gas/bilancio/pdf/bilancio.pdf
>
> is
> updated.
>
> I would like to set up an R procedure to be run daily in a
> server able to read the figures in a couple of lines only
> ("Industriale" and "Termoelettrico", towards the end of the balance)
> and put the data in a table.
>
> Is that possible? If yes, what R-packages
> should I use?
>
> Ciao
> Vittorio
>
> ______________________________________________
> R-help_at_stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 09 May 2007 - 17:50:35 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 09 May 2007 - 19:31:41 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.