Re: [R] XML parameters to Column Headers for importing into a dataset

From: Martin Morgan <mtmorgan_at_fhcrc.org>
Date: Thu, 12 Jun 2008 09:07:45 -0700

Hi Ajay --

"ajay ohri" <ohri2007_at_gmail.com> writes:

> Dear List,
>
> Do you know any way I can convert XML parameters into column headers. My

In R, the XML package will help you...

> data is in a csv file with each row containing a xml form of data , and
> multiple parameters (
>
> <param1> data_val1 </param2> , <param2> data_val2 </param2> )

I guess that first closing tag is param1...

> I want to convert it so each row caters to one record and each parameter
> becomes a different column.
>
> param1 param2
> Row1 data_val1 data_val2
>
> What is the most efficient way for doing this. Apologize for the duplicate

Personally I like to use the xpath query language; the following relies a little on your data being regular (e.g., all rows having entries for all column values), but for some file 'fl' (perhaps accessible via a url)

library(xml)
xml = xmlTreeParse(fl, useInternal=TRUE) data.frame(

    param1 = unlist(xpathApply(xml, "//param1", xmlValue)),     param2 = unlist(xpathApply(xml, "//param2", xmlValue)))

does the trick. these are string values, you can convert them to numeric in the usual R way (as.numeric(unlist...)) or at the xpath level (along the lines of xpathApply(xml, "number(//param1)")).

xpath help is available at http://www.w3.org/TR/xpath, especially

http://www.w3.org/TR/xpath#path-abbrev

The above is with R 2.7.0 and XML 1.95-2

Martin

> email , but this is an emergency with loads of files for me !!!
>
> Regards,
>
> Ajay
>
> www.decisionstats.com
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 12 Jun 2008 - 16:31:34 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 12 Jun 2008 - 17:30:42 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive