[R] Extract just some fields from XML

From: Gorjanc Gregor <Gregor.Gorjanc_at_bfro.uni-lj.si>
Date: Mon 09 May 2005 - 02:29:25 EST


Hello!

I am trying to get specific fields from an XML document and I am totally puzzled. I hope someone can help me.

# URL

URL<-"
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11877539,11822933,11871444&retmode=xml&rettype=citation"
# download a XML file

tmp <- xmlTreeParse(URL, isURL = TRUE)
tmp <- xmlRoot(tmp)

Now I want to extract only node 'pubdate' and its children, but I don't know how to do that unless I try to dig into the structure of the XML file. The problem is that structure can differ and then hardcoded set of list indices i.e. tmp[[i]][[j]]... doesn't help me.

I've read xmlEventParse but I don't understand handlers part up to the point that I could get anything usable from it. Here is something not very usable ;)

  PubDate <- function(x, ...)
  {
    print(x)
  }
  xmlEventParse(URL, isURL = TRUE,

                handlers=list(PubDate=PubDate),
                addContext = FALSE)

Thanks in advance!

Lep pozdrav / With regards,

    Gregor Gorjanc



University of Ljubljana
Biotechnical Faculty        URI: http://www.bfro.uni-lj.si/MR/ggorjan
Zootechnical Department     mail: gregor.gorjanc <at> bfro.uni-lj.si
Groblje 3                   tel: +386 (0)1 72 17 861
SI-1230 Domzale             fax: +386 (0)1 72 17 888
Slovenia, Europe

"One must learn by doing the thing; for though you think you know it,  you have no certainty until you try." Sophocles ~ 450 B.C.

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon May 09 02:37:10 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:38 EST