Re: [R] Analyzing Publications from Pubmed via XML

From: David Winsemius <dwinsemius_at_comcast.net>
Date: Tue, 18 Dec 2007 00:53:09 +0000 (UTC)

"Armin Goralczyk" <agoralczyk_at_gmail.com> wrote in news:a695fbee0712171238g4995040x579e58f52f83376e_at_mail.gmail.com:

> On Dec 15, 2007 6:31 PM, David Winsemius <dwinsemius@comcast.net>
> wrote:

>> > pm.srch<- function (){
>>    srch.stem
>>    <-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pub
>>    med&term=" query <-as.character(scan(file="",what="character"))
>>    doc <-xmlTreeParse(paste(srch.stem,query,sep=""),isURL = TRUE,
>>          useInternalNodes = TRUE)
>>    sapply(c("//Id"), xpathApply, doc = doc, fun = xmlValue)
>>      }
>> > pm.srch()
>> 1: "laryngeal neoplasms[mh]"
>> 2:
>> Read 1 item
>>       //Id
>>  [1,] "18042931"

snipped list of IDs
>>
>>

> I tried the above function with simple search terms and it worked fine
> for me (also more output thanks to Martin's post) but when I use
> search terms attributed to certain fields, i.e. with [au] or [ta], I
> get the following error message:
>> pm.srch()

> 1: "laryngeal neoplasms[mh]"
> 2:
> Read 1 item
> Fehler in .Call("RS_XML_ParseTree", as.character(file), handlers,
> as.logical(ignoreBlanks), :
> error in creating parser for
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&ter
> m=laryngeal neoplasms[mh]
> I/O warning : failed to load external entity
> "http%3A//eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi%3Fdb=pubme
> d&term=laryngeal%20neoplasms%5Bmh%5D"
>>
> What's wrong?

I'm not sure. You included my simple example. rather than your search string that provoked an error. This is an example search that one can find on the how-to page for literature searches with /esearch:

http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=PNAS[ta]+AND+97[vi]&retstart=6&retmax=6&tool=biomed3

I am wondering if you used spaces, rather than "+"'s? If so then you may want your function to do more gsub-processing of the input string.

When I use the search terms in NCBI's example I get:

> pm.srch<- function (){

+    srch.stem<-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term="
+              query<-as.character(scan(file="",what="character"))
+              doc<-xmlTreeParse(paste(srch.stem,query,sep=""),isURL = TRUE, useInternalNodes = TRUE)
+              sapply(c("//Id"), xpathApply, doc = doc, fun = xmlValue)
+      }

> doc.xml<-pm.srch()

1: "PNAS[ta]+AND+97[vi]"
2:
Read 1 item
> doc.xml

      //Id

 [1,] "16578858"
 [2,] "11186225"
 [3,] "11121081"
 [4,] "11121080"
 [5,] "11121079"
 [6,] "11121078"
 [7,] "11121077"
 [8,] "11121076"
 [9,] "11121075"
[10,] "11121074"
[11,] "11121073"
[12,] "11121072"
[13,] "11121071"
[14,] "11121070"
[15,] "11121069"
[16,] "11121068"
[17,] "11121067"
[18,] "11121066"

[19,] "11121065"
[20,] "11121064"
-- 
David Winsemius, MD



> Thanks for any help
> --
> Armin Goralczyk, M.D.
______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Tue 18 Dec 2007 - 00:59:35 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 18 Dec 2007 - 05:30:19 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.