Re: [R] A comment about R:

From: Bob Green <bgreen_at_dyson.brisnet.org.au>
Date: Wed 04 Jan 2006 - 12:36:46 EST

>Hello,

>Unlike most posts on the R mailing list I feel qualified to comment on 
>this one.  For about 3 months I have been trying to learn use R,  after 
>having  used various versions of SPSS for about  10 years.


I think it is far too simplistic to ascribe non-use of R to laziness. This may well be the case for some, however, I have read 5-6 books on R, waded through on-line resources, read the documentation and asked multiple questions via e-mails - and still find even some of the basics very difficult.

There are several reasons for this:

  1. For some tasks R is extremely user-unfriendly. Some comparative examples:

(a) In running a chi-square analysis in SPSS the following syntax is included

/STATISTIC=CHISQ
   /CELLS= COUNT EXPECTED ROW COLUMN TOTAL RESID . this produces expected and observed counts, row & column percentages, residuals, chi-square & Fisher's exact test + other output.

In R, it is a herculean task to produce similar output . It certainly, can't be produced in 2 lines as far as I can tell.

(b) in SPSS if I want to compare multiple variables by a single dependent variable this is readily performed

CROSSTABS
   /TABLES=baserdis baserenh basersoc baseradd socbest disbest entbest addbest worsdis worsphy by group

I used the chi-square example again, but the same applies for a t-test. I started looking into how to do something similar in R, with the t-test command but gave up. R does force the user to take a more considered approach to analysis.

(c) To obtain a correlation matrix in R with the correlation & p-value is no simple task -

In SPSS this is obtained via:

GET
   FILE='D:\a study\data\dat\key data\master data.sav'. NONPAR CORR

   /VARIABLES= goodnum badnum good5 bad5 avfreq avdayamt
   /PRINT=KENDALL TWOTAIL
   /MISSING=PAIRWISE .

In R something like this is required -

 > by(mydat, mydat$group, function(x) {
+ nm <- names(x)
+ rho <- matrix(, 6, 2)
+ rho.nm <- matrix(, 6, 2)
+ k <- 1
+ for(i in 2:4) {
+ for(j in (i + 1):5) {
+ x.i <- x[, i]
+ x.j <- x[, j]
+ ct <- cor.test(x.i, x.j, method=c("kendall") , alternative =c("two-sided"))
+ rho[k, 1] <- ct$estimate
+ rho[k, 2] <- round(ct$p-value, 3)
+ rho.nm[k, ] <- c(nm[i], nm[j])
+ k <- k + 1
+ }
+ }
+ rho <- cbind(as.data.frame(rho.nm), as.data.frame(rho))
+ names(rho) <- c("freq.i", "freq.j", "cor", "p-value")
+ rho
+ })

2) It is not always clear what the output produced by R, is. The Mann-Whitney U-test is a good example. In R, it seems a standardised value is obtained. I was advised that it is easy enough to check this as R is open-source, but at least for me, I don't believe I would understand this code anyway. It is confusing when comparative programs such as R and SPSS produce dis-similar results. For the user it is important to be able to fairly easily reconcile such differences, to engender confidence in results.

3) I find the help files in R quite difficult to understand. For example, see help(t.test). It is almost assumed by the examples that you know what to do. Personally, I would find some form of simple decision tree easier -e.g. If you want to perform a t-test with the dependent variable in one column and the dependent use in another use t.test(AVFREQ~GROUP) . If you want to perform a t-test with the dependent variable in separate columns (each column representing a different group) use - t.test(AVFREQ1, AVFREQ2) .

4) My initial approach to using R, was to run commands I had used commonly in SPSS and compare the results. I have only got as far as basic ANOVA. This has been time-consuming and at times it has been difficult to obtain advice. Some people on the R list have been extremely generous with their time and knowledge, and I have much appreciated this assistance. At other times I see responses met with something like arrogance. With the sophistication of R, there is also an elitism. This is a barrier to R being more widely accepted and used.

5) differences in terminology - this is just part of the learning process, but I still found it took quite some time to work out simple commands and what different analyses were called.

6) system administrators may be wary of freeware.

No doubt for the sophisticated user, my comments may seem trite and easily resolved, however I believe my comments have some relevance as to why R is not more readily used or accepted.

Bob Green



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Jan 04 12:47:13 2006

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:41:46 EST