[R] A comment about R:

From: Stefan Eichenberger <Stefan.Eichenberger_at_se-kleve.com>
Date: Sat 07 Jan 2006 - 01:38:35 EST

I just got into R for most of the Xmas vacations and was about to ask for helping pointer on how to get a hold of R when I came across this thread. I've read through most it and would like to comment from a novice user point of view. I've a strong programming background but limited statistical experience and no knowledge on competing packages. I'm working as a senior engineer in electronics.

Yes, the learning curve is steep. Most of the docu is extremely terse. Learning is mostly from examples (a wiki was proposed in another mail...), documentation uses no graphical elements at all. So, when it comes to things like xyplot in lattice: where would I get the concepts behind panels, superpanels, and the like?

ok., this is steep and terse, but after a while I'll get over it... That's life. The general concept is great, things can be expressed very densly: Potential is here.... I quickly had 200 lines of my own code together, doing what it should - or so I believed.

Next I did:

    matrix<-matrix(1:100, 10, 10) image(matrix)     locator()
Great: I can interactively work with my graphs... But then:

Oops - wrong coordinates returned. Bug. Apparently, locator() doen't realize that fitted.contour() has a color bar to the right and scales x wrongly...

Here is what really shocked me:

> str(bar)

`data.frame': 206858 obs. of 12 variables:  ...
> str(mean(bar[,6:12]))

 Named num [1:7] 1.828 2.551 3.221 1.875 0.915 ...  ...
> str(sd(bar[,6:12]))

 Named num [1:7] 0.0702 0.1238 0.1600 0.1008 0.0465 ...  ...
> prcomp(bar[,6:12])->foo
> str(foo$x)

 num [1:206858, 1:7] -0.4187 -0.4015 0.0218 -0.4438 -0.3650 ...  ...
> str(mean(foo$x))

 num -1.07e-13
> str(sd(foo$x))

 Named num [1:7] 0.32235 0.06380 0.02254 0.00337 0.00270 ...  ...

So, sd returns a vector independent on whether the arguement is a matrix or data.frame, but mean reacts differently and returns a vector only against a data.frame?

The problem here is not that this is difficult to learn - the problem is the complete absense of a concept. Is a data.frame an 'extended' matrix with columns of different types or something different? Since the numeric mean (I expected a vector) is recycled nicely when used in a vector context, this makes debugging code close to impossible. Since sd returns a vector, things like mean + 4*sd vary sufficiently across the data elements that I assume working code... I don't get any warning signal that something is wrong here.

The point in case is the behavior of locator() on a filled.contour() object: Things apparently have been programmed and debugged from example rather than concept.

Now, in another posting I read that all this is a feature to discourge inexperienced users from statistics and force you to think before you do things. Whilst I support this concept of thinking: Did I miss something in statistics? I was in the believe that mean and sd were relatively close to each other conceptually... (here, they are even in different packages...)

I will continue using R for the time being. But whether I can recommend it to my work collegues remains to be seen: How could I ever trust results returned?

I'm still impressed by some of the efficiency, but my trust is deeply shaken...

Stefan Eichenberger mailto:Stefan.Eichenberger@se-kleve.com

        [[alternative HTML version deleted]]

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Jan 07 02:02:12 2006

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:41:54 EST