Re: [R] any book and tutorial about how to manipulate data with R/S+

From: Michael Grant <mwgrant2001_at_yahoo.com>
Date: Sun 13 Mar 2005 - 13:53:02 EST


Wensui,

Here is an answer from a different perspective. Reading between the lines, you may be involved in 'remedial' data preparation at times. Depending on exactly what kind of tasks you are talking about you MAY be well advised to work in a database--that is why they exist. It just depends on what you have to do.

I work with environmental data. And I work with some really junky data at times. I have to spend lots of effort grooming and combining data originally collected for reasons other than the one at hand, from disparate idiosyncratic sources, having information in both similar and very dissimilar formats, data of varying completeness, etc. I have to process data qualifiers, strip numbers out of strings, put them in--on and on. And of course it is different from record to record. This is just the nature of the beast.

Another element is doing these same tasks over. One sometimes does not get the data in one shot. I remediate data, construct datasets, and process it. Then I get additional and/or corrected data and have do it again. This kind of thing is probably easier to track in DBs or spreadsheets.

I would never try doing these tasks in R (or SPSS/SAS for that matter.) EXCEL works up to a point but I also go into MSAcess exploiting its visual query building and VB capabilities. As much as I dislike MS(I've been bitten too many times)I have to admit that the ability to easily construct (visual) queries, browse the results, etc., has been very useful. This kind remedial preparation is sometimes easy and sometimes brutal.

A point here is that as the complexity of your data preparation increases it may be more efficient to do it in applications more appropriate to the task. Where the breakpoint is, is of a function of your own capabilities/inclinations in R (SPSS, SAS), EXCEL, Access or whatever. The one thing I know is that the problems of data prep., in my world al least, has always been there and will likely remain. I accept it and move on.

The approach(es) you develop should be influenced by the frequency of such efforts and the size of the datasets typically involved. BTW, one truism is that project managers do not seem capable of understanding that just because something is in a computer does not mean it is ready to go to give them what they want :O(. Gee this stuff takes work...as you seem well aware.

I steel myself for the task by reminding myself that writing and running the R programming is an enjoyable reward for my toil. R is fun. SPSS never was. I have not worked much with SAS because--and this a consideration--I can't afford a seat at home.

BTW if some DB appears appropriate, then learn some SQL --even the if you use Access. There is always RODBC out there and it may be useful down the road.

If you don't want to do all this then get an intern, graduate student, postdoc, or new career ;O).

Best regards,
Michael Grant
Graduate School of Applied Brute Force in the Sciences



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Mar 14 09:57:31 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:30:42 EST