Re: [R] How to extract following data

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Wed, 05 Nov 2008 07:32:03 -0500

Here is another solution made slightly shorter by using strapply twice:

z <- zoo(strapply(Lines, "[0-9]+[.][0-9]+", as.numeric)[[1]],   strapply(Lines, "....-..-..", as.Date)[[1]])

or to create a data frame:

DF <- data.frame(date = strapply(Lines, "....-..-..", as.Date)[[1]],

     price = strapply(Lines, "[0-9]+[.][0-9]+", as.numeric)[[1]])

On Wed, Nov 5, 2008 at 6:22 AM, Gabor Grothendieck <ggrothendieck_at_gmail.com> wrote:
> As others have pointed out its close to XML but not quite
> there; however, you could use strapply in gsubfn to extract
> the data. It pulls out the data matching the regular expression
> giving vector, vec, consisting of: date price date price ...
> Pulling out even and odd elements separately and
> converting them to Date and numeric, respectively, gives the
> resulting data.frame.
>
> See
> http://gsubfn.googlecode.com
> for more on the gsubfn package and
> the three zoo vignettes in the zoo package for more on it.
>
> Lines <- '- <Temp diffgr:id="Temp14" msdata:rowOrder="13">
> <Date>2005-01-17T00:00:00+05:30</Date>
> <SecurityID>10149</SecurityID>
> <PriceClose>1288.40002</PriceClose>
> </Temp>
> - <Temp diffgr:id="Temp15" msdata:rowOrder="14">
> <Date>2005-01-18T00:00:00+05:30</Date>
> <SecurityID>10149</SecurityID>
> <PriceClose>1291.69995</PriceClose>
> </Temp>
> - <Temp diffgr:id="Temp16" msdata:rowOrder="15">
> <Date>2005-01-19T00:00:00+05:30</Date>
> <SecurityID>10149</SecurityID>
> <PriceClose>1288.19995</PriceClose>
> </Temp>'
>
> library(gsubfn)
> vec <- strapply(Lines, "....-..-..|[0-9]+[.][0-9]+")[[1]]
> ix <- seq_along(vec) %% 2 == 1
> DF <- data.frame(date = as.Date(vec[ix]), price = as.numeric(vec[!ix]))
>
> # or, instead of the last line, you could convert it to a zoo object so
> # that its in a more convenient form for time series manipulation:
>
> library(zoo)
> z <- zoo(as.numeric(vec[!ix]), as.Date(vec[ix]))
>
>
>
> On Wed, Nov 5, 2008 at 1:22 AM, RON70 <ron_michael70_at_yahoo.com> wrote:
>>
>> Hi everyone,
>>
>> I have this kind of raw dataset :
>>
>> - <Temp diffgr:id="Temp14" msdata:rowOrder="13">
>> <Date>2005-01-17T00:00:00+05:30</Date>
>> <SecurityID>10149</SecurityID>
>> <PriceClose>1288.40002</PriceClose>
>> </Temp>
>> - <Temp diffgr:id="Temp15" msdata:rowOrder="14">
>> <Date>2005-01-18T00:00:00+05:30</Date>
>> <SecurityID>10149</SecurityID>
>> <PriceClose>1291.69995</PriceClose>
>> </Temp>
>> - <Temp diffgr:id="Temp16" msdata:rowOrder="15">
>> <Date>2005-01-19T00:00:00+05:30</Date>
>> <SecurityID>10149</SecurityID>
>> <PriceClose>1288.19995</PriceClose>
>> </Temp>
>>
>> I was looking for some R procedure to extract data from this, that should be
>> in following format :
>>
>> 2005-01-17 1288.40002
>> 2005-01-18 1291.69995
>> 2005-01-19 1288.19995
>>
>> Can R help me to do this?
>>
>> --
>> View this message in context: http://www.nabble.com/How-to-extract-following-data-tp20336690p20336690.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 05 Nov 2008 - 12:35:17 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 05 Nov 2008 - 14:00:23 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive