Re: [R] Justifying R to anti open-source management

From: Michael Grant <mwgrant2001_at_yahoo.com>
Date: Thu 18 May 2006 - 03:52:50 EST


Hello Peter,

I am working on a related problem--getting R acceptable within division and project QA. Unfortunately, it seems to keep getting put on the back-burner as I address time sensitive needs. I did some googling and made a few phone calls. I'm expect that there is much more to be found but below is an US agency oriented compilation of what I got in my brief search. It seems I ran into a number of USDOE (National Labs HPC stuff) reports but I seem to have lost track of that info.

QA in non-academic circles can be an anti-quality driver someyimes can't it. Oh, let's give this thread some irrelevant legs...EXCEL!!!! You all know what I am talking about ;O)

Regards,
Michael Grant

My little but serious list (HTH):

1.) US Environmental Protection Agency -- Dr.R. Woodrow Setzer of the USEPA and a contributor to this list pointed out this comment in an EPA FIFRA Scientific Advisory Review Panel report :

“The Panel also commends the EPA on the use of R (see the main EPA report for references), as it is the best way to ensure portable, open code that is freely available to all interested users, with state-of-the art algorithms for statistical calculation.” -- FIFRA Scientific Advisory Panel , http://www.epa.gov/scipoly/sap/2001/september/finalreport.htm

A Set of Scientific Issues Being Considered by the Environmental Protection Agency Regarding:

Preliminary Cumulative Hazard and Dose-Response Assessment for Organophosphorus Pesticides: Determination of Relative Potency and Points of Departure for Cholinesterase

R was also used in the N-methyl Carbamates cumulative risk assessment—link at http://www.epa.gov/oscpmont/sap/2005/index.htm#august

2.) US National Institute of Standards and Techno logy (NIST), Statistical Engineering Division
http://www.itl.nist.gov/div898/pubs/ar/SED2004.pdf Collaborative research between members of the Statistical Engineering Division (SED) and members of the Process Measurements Division (Chemical Sciences and Technology Laboratory) has required that SED staff investigate various statistical tools for data mining. These tools include some very powerful statistical
classification/prediction methods for high-dimensional data. This article briefly summarize this ongoing effort with the goal of bringing attention to a wide array of methods in a statistical toolkit that is already easily available to NIST scientists who may need them. Most of these functions have a user-friendly interface in the open source environment R and widely available commercial product S-plus.

3.) USDOE Department of Energy, Oak Ridge National Laboratory, http://www.csm.ornl.gov/esh/aoed/ORNLTM2005ab52.htm STATISTICAL METHODS AND SOFTWARE FOR THE ANALYSIS OF OCCUPATIONAL EXPOSURE DATA WITH NON-DETECTABLE VALUES Edward L. Frome
Computer Science and Mathematics Division Oak Ridge National Laboratory

Paul F. Wambach
U. S. Department of Energy
Date Published: September 2005
All of these methods are well known but computational complexity has limited their use in routine data analysis with left censored data. The recent development of the R environment for statistical data analysis and graphics has greatly enhanced the availability of high-quality nonproprietary (open source) software that serves as the basis for implementing the methods in this paper. Numerical examples are provided and R(2004) functions are available at the analysis of occupational exposure data web site http://www.csm.ornl.gov/esh/aoed/ (AOED).

4.) Historical Evaluation of the Film Badge Dosimetry Program at the Y-12 Facility in Oak Ridge, Tennessee, Part 1 – Gamma Radiation J.P. Watkins1, G.D. Kerr2, E.L. Frome3, W.G. Tankersley1, and C.M. West+ ORAU Technical Report # 2004-0888
1Center for Epidemiologic Research, Oak Ridge Associated Universities 2Kerr Consulting Company
3Computer Science and Mathematics Division, Oak Ridge National Laboratory +Deceased
This work was done under Contract No. 200-2002-00593 with the National Institute for
Occupational Safety and Health.

5.) US FEMA http://www.fema.gov/txt/fhm/frm_cfd43.txt Flood 4.3 Flood frequency analysis methods

At the end of this section:

"Several open-source and commercial software packages provide tools to assist in the sorts of analyses discussed in this section. In particular, the S, S-PLUS, and R programming languages (commercial and open-source versions of a high-level statistical programming language) include comprehensive statistical tools. The R language package is available for free from the web site http://www.r-project.org/; several books discussing the use of R and S are available. Other well-known software packages include Mathematica, Matlab, SPSS, and SYSTAT."

6.) National Cancer Institute Advanced Biomedical Computing Center list R as “available to staff” at
http://www1.ncifcrf.gov/app/htdocs/appdb/appinfo.php?appname=R-Project

7.) Weston, USACE and USEPA:
MODEL VALIDATION: MODELING STUDY OF PCB CONTAMINATION IN THE HOUSATONIC RIVER

> Hi
>
> I apologise for this question as it really must be a FAQ. Unfortunately,
> I can't find the answer and I'm tired of looking at endless google results
>
> A colleague of mine works for a state government department that has a
> policy against open source software or software tainted by open
> source. Other government departments in the same state use R but this
> particular department is driven by very non-numerate people and
> superficially at least it appears somewhat backward IT-wise. The
> department may purchase SPlus (which may be better for non programmer
> types anyway) or SPSS but it would nice to have the option to use R
>
> The Q:
>
> Are there any documents/reports/papers out there justifying R that
> comment on
> - quality of R
> - huge range of libraries available
> - support (via a huge and enthusiastic user base - any ideas on how
> many people use R)
>
> I suspect that providing existing documents would carry more weight
> rather than writing a case from scratch or providing people's email
> opinions
>
> Thanks in advance!
>
> Cheers
> Peter
>
> --
> Dr Peter Baker, Statistician (Bioinformatics/Genetics),
> CSIRO Mathematical & Information Sciences, Queensland Bioscience Precinct
> 306 Carmody Road, St Lucia Qld 4067. Australia.
> Email: <Peter.Baker_at_csiro.au> WWW: http://www.cmis.csiro.au/Peter.Baker/
> Phone: +61 7 3214 2210 Fax: +61 7 3214 2900
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu May 18 04:00:12 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 18 May 2006 - 06:10:14 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.