Re: [R] [SPAM] - Re: Relational Databases or XML? - Bayesian Filter detected spam

From: Doran, Harold <HDoran_at_air.org>
Date: Fri, 11 Apr 2008 08:23:37 -0400


This does seem excellent, I should check CRAN more often than I do. OTOH, I really like element tree in python. One benefit there (for those who care) is that you can create an executable using py2exe that others can use if they don't have python on their machine. This is like sprinkling some "fairy dust" on the python program to make it seem like a "real" program. But in my org, I can write the program, make sure it works, and then hand the work load on to someone else who can parse the xml files for me w/o any knowledge of python.

> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan_at_fhcrc.org]
> Sent: Thursday, April 10, 2008 5:31 PM
> To: Doran, Harold
> Cc: Keith Alan Chamberlain; r-help_at_r-project.org
> Subject: [SPAM] - Re: [R] Relational Databases or XML? -
> Bayesian Filter detected spam
>
> Harold -- you'll really want to check out the XML package.
> xmlTreeParse
> + xpathApply provides a very flexible solution. As a recent example,
> parsing 189 XML files to extract 4 attributes from deeply
> nested elements into a data frame:
>
> fls <- list.files('~/runBrowser', pattern=".*xml", full=TRUE)
> f <- function(fl) {
> xq <- function(xml, q)
> unlist(xpathApply(xml, q, xmlValue, namespaces="xsi"))
> xml <- xmlTreeParse(fl, useInternal=TRUE)
> data.frame(idx=rep(as.numeric(xq(xml,
> "//xsi:tile/@idx")), each=4),
> lane=rep(as.numeric(xq(xml, "//xsi:tile/@lane")), each=4),
> base=xq(xml, '//xsi:image/@base'),
> medSigInt=as.numeric(xq(xml,
> "//xsi:sgnInt/@median"))) } res <- do.call('rbind', lapply(fls, f))
>
> 'res' has 54800 rows and 4 columns. The XML stays in C, so
> this is fast.
> The data can be effectively (your mileage may vary)
> visualized with lattice, e.g.,
>
> xyplot(log(medSigInt)~idx|lane*base, res, strip=FALSE, pch=".", cex=2)
>
> Martin
>
> Doran, Harold wrote:
> > I'm not sure it is possible to parse an XML file in R
> directly. Well,
> > I guess it's *possible*, but may not be the best way to do it.
> > ElementTree in Python is an easy-to-use parser that you
> might use to
> > first parse your XML file (or others hierarchically
> structured data),
> > organize it anyway you want, and then bring those data into R for
> > subsequent analysis.
> >
> > In fact, I have recently done just this. I have another statistical
> > program that outputs data as an XML file. So, I wrote a
> python program
> > that parses that XML file, pulls out the data of interest
> into a text
> > file, and then I bring those data into R for analysis.
> >
> >> -----Original Message-----
> >> From: r-help-bounces_at_r-project.org
> >> [mailto:r-help-bounces_at_r-project.org] On Behalf Of Keith Alan
> >> Chamberlain
> >> Sent: Thursday, April 10, 2008 4:14 PM
> >> To: r-help_at_r-project.org
> >> Subject: [R] Relational Databases or XML?
> >>
> >> Dear R-Help,
> >>
> >> I am working on a paper in an R course for large file support in R
> >> using scan(), relational databases, and XML. I have never
> used SQL or
> >> heirarchical document formats such as XML (except where it occurs
> >> without user interaction), and knowledge in RDBs and XML
> is lacking
> >> in my program. I have tried finding a working example for the
> >> novices-novice on the topic, read many postings, the r-data I/O
> >> manual several times, and descriptions of packages RODBC,
> DBI, XML,
> >> among others. I understand that RDBs are (assumed at least) used
> >> widely among the R community. I have not been able to put
> all of the
> >> pieces together, but assuming that RDB use is actually quite
> >> widespread, it should be quite easy to fill me in and/or
> correct my
> >> understanding where necessary.
> >>
> >> For a cross-platform solution (PC/OSX at least, or in part) my
> >> questions/problems are about what preliminary steps are
> needed to get
> >> an SQL or XML query "to work" in R to begin with, what the
> >> appropriate data-file formats are, and how to convert to them if
> >> starting out with data in, say, a delimited ASCII text file. Very
> >> basic examples should suffice, say, a table with 20 random
> >> observations, a grouping variable with 2 levels, and a
> factor with 2
> >> levels.
> >>
> >> ## untested code
> >> set.seed(1024)
> >> write.table("junk.txt",
> >> data.frame(Subj=c(rep(1,10),rep(2,10)),block=rep(c(rep(-1,5),r
> >> ep(1,5)),2), obs=rnorm(20,0,1)))
> >>
> >> Specifically,
> >>
> >> 1- what are the minimum required non R components that are
> needed to
> >> support SQL or XML functionality, which may or may not need to be
> >> installed?
> >>
> >> 2- what R packages need to be installed, at a minimum (also as a
> >> cross-PC/Mac solution if possible or at least as much as
> >> possible)
> >>
> >> 3- I keep seeing reference to connections of a given name "if
> >> previously setup". What kind of setup is needed outside of
> R, if any?
> >>
> >> 4- what steps are needed in R to then connect to a file
> and import a
> >> subset based on a query?
> >>
> >> 5- Do I then use standard R routines (e.g. write()) to export as a
> >> DB, or an RDB/XML specific function?
> >>
> >> Sincerely,
> >> KeithC. [U.S]
> >>
> >> 1/k^c
> >>
> >> ______________________________________________
> >> R-help_at_r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research
> Center 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M2 B169
> Phone: (206) 667-2793
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 11 Apr 2008 - 12:27:25 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 11 Apr 2008 - 12:30:29 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive