Fwd: [R] Extract just some fields from XML]

From: Gregor GORJANC <gregor.gorjanc_at_gmail.com>
Date: Wed 11 May 2005 - 07:46:41 EST

Duncan, you are a king!

Thanks a lot for this cookie. It really helped me. Thanks for the code as well as detailed explanation at the end.

>Hi Gregor.
>
>Here is a function that will collect all of the nodes in the
>XML document whose names are in the vector elementNames
>
>getElements =
>function(elementNames)
>{
> els = list()
>
> startElement = function(node, ...) {
>
> if(xmlName(node) %in% elementNames)
> els[[length(els) + 1]] <<- node
>
> node
> }
>
> list(startElement = startElement, els = function() els)
>}
>
>So you can use it as
>
> myHandlers = getElements("PubDate")
> xmlTreeParse(URL, handlers = myHandlers)
>
>And then
> myHandlers$els()
>
>returns a list of the the three PubDate elements in the document.
>
>If you wanted both PubDate and PubMedPubDate elements,
>you could use
>
> myHandlers = getElements(c("PubDate", "PubMedPubDate")
>
>[Note that XML is case-sensitive and pubdate won't work.]
>
>The xmlEventParse is quite a bit more work as it is for
>very low-level parsing, working at the parser level
>of opening and closing XML elements.
>
>The xmlTreeParse is a hybrid parser. It works at the higher
>level of nodes, but provides an opportunity to process
>nodes when they are "created" and before their parent
>nodes have been processed. So it works bottom up
>(in one of its modes).
>
>You can also use xmlDOMApply() to iterate over all the
>nodes of a parsed XML tree. You give xmlDOMApply() a
>function and it can do whatever it wants, including
>checking the name of the node to see if you want it
>and then storing it somewhere. That's where you'll
>need closures (simply viewed the "functions within functions" part) again,
>as in my example above.
>
>But here is a simple example
> doc = xmlRoot(xmlTreeParse(URL))
> xmlDOMApply(doc, function(node, ...)
> if(xmlName(node) == "PubDate")
> print(node)
> )

Gorjanc Gregor wrote:
> Hello!
>
> I am trying to get specific fields from an XML document and I am totally
> puzzled. I hope someone can help me.
>
> # URL
> URL<-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11877539,11822933,11871444&retmode=xml&rettype=citation"
> # download a XML file
> tmp <- xmlTreeParse(URL, isURL = TRUE)
> tmp <- xmlRoot(tmp)
>
> Now I want to extract only node 'pubdate' and its children, but I don't
> know how to do that unless I try to dig into the structure of the XML
> file. The problem is that structure can differ and then hardcoded set
> of list indices i.e. tmp[[i]][[j]]... doesn't help me.
>
> I've read xmlEventParse but I don't understand handlers part up to the
> point that I could get anything usable from it. Here is something not
> very usable ;)
>
> PubDate <- function(x, ...)
> {
> print(x)
> }
> xmlEventParse(URL, isURL = TRUE,
> handlers=list(PubDate=PubDate),
> addContext = FALSE)
>
> Thanks in advance!
>
> Lep pozdrav / With regards,
> Gregor Gorjanc
>
> ----------------------------------------------------------------------
> University of Ljubljana
> Biotechnical Faculty URI: http://www.bfro.uni-lj.si/MR/ggorjan
> Zootechnical Department mail: gregor.gorjanc <at> bfro.uni-lj.si
> Groblje 3 tel: +386 (0)1 72 17 861
> SI-1230 Domzale fax: +386 (0)1 72 17 888
> Slovenia, Europe
> ----------------------------------------------------------------------
> "One must learn by doing the thing; for though you think you know it,
> you have no certainty until you try." Sophocles ~ 450 B.C.
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Duncan Temple Lang                duncan@wald.ucdavis.edu
Department of Statistics          work:  (530) 752-4782
371 Kerr Hall                     fax:   (530) 752-7099
One Shields Ave.
University of California at Davis
Davis, CA 95616, USA

--
Lep pozdrav / With regards,
    Gregor Gorjanc

----------------------------------------------------------------------
University of Ljubljana
Biotechnical Faculty        URI: http://www.bfro.uni-lj.si/MR/ggorjan
Zootechnical Department     mail: gregor.gorjanc <at> bfro.uni-lj.si
Groblje 3                   tel: +386 (0)1 72 17 861
SI-1230 Domzale             fax: +386 (0)1 72 17 888
Slovenia, Europe
----------------------------------------------------------------------
"One must learn by doing the thing; for though you think you know it,
 you have no certainty until you try." Sophocles ~ 450 B.C.
----------------------------------------------------------------------


-- 
--
Lep pozdrav / With regards,
    Gregor Gorjanc

----------------------------------------------------------------------------------------------------
University of Ljubljana
Biotechnical Faculty            URI: http://www.bfro.uni-lj.si/MR/ggorjan
Zootechnical Department     mail: gregor.gorjanc <at> bfro.uni-lj.si
Groblje 3                            tel: +386 (0)1 72 17 861
SI-1230 Domzale                fax: +386 (0)1 72 17 888
Slovenia, Europe
----------------------------------------------------------------------------------------------------
"One must learn by doing the thing; for though you think you know it,
 you have no certainty until you try." Sophocles ~ 450 B.C.

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Wed May 11 07:52:23 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:41 EST