Re: [R] Analyzing Publications from Pubmed via XML

From: Duncan Temple Lang <duncan_at_wald.ucdavis.edu>
Date: Sat, 15 Dec 2007 11:07:21 +1300

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Farrel Buchinsky wrote:

>> The problem is that the RSS feed you linked to, does not contain the
>> year of the article in an easily accessible XML element. Rather you
>> have to process the HTML content of the description element - which,
>> is something R could do, but you'd be using the wrong tool for the job.
>>

>
> Yes. I have noticed that there two sorts of xml that pubmed will
> provide. The kind I had hooked into was an rss feed which provides a
> lot of the information simply as a formatted table for viewing in a
> rss reader. There is another way to get the xml to come out with more
> tags. However, I found the best way to do this is probably through the
> bioconductor annotate package
>
> x <- pubmed("18046565", "17978930", "17975511")
> a <- xmlRoot(x)
> numAbst <- length(xmlChildren(a))
> absts <- list()
> for (i in 1:numAbst) {
> absts[[i]] <- buildPubMedAbst(a[[i]])
> }

You can simplify the final 5 lines to

   absts = xmlApply(a, buildPubMedAbst)

which is shorter, fractionally faster and handles cases where there are no abstracts.

>
> I am now trying to work through that approach to see what I can come up with.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHYv6Z9p/Jzwa2QP4RAp0NAJ4pfGS7Jy9nwHMOGpT1jVM+IMedywCeOZPG 9GER8GI62Y24a+cQT7KbW08=
=4TVP
-----END PGP SIGNATURE-----



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 14 Dec 2007 - 22:10:49 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 14 Dec 2007 - 23:30:19 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.