Re: [R] A really simple data manipulation example

From: Christophe Pallier <christophe_at_pallier.org>
Date: Wed, 27 Jun 2007 10:33:30 +0200

For merging, selecting and aggregating, R is not too bad: I believe the following code is more or less equivalent to your Vilno example, isn't?

# to input data, replace the following two lline by 'read.table' labresults <- data.frame(patient.id=gl(10,5), visit.num=gl(5,10), sodium=rnorm(50))
demo <- data.frame(patient.id=gl(10,1), gender=gl(2,5,labels=c('F','M')))

data <- merge(labresults, demo)
data <- subset(data, visit.num!=2)
with(data,

     aggregate(sodium, list(gender=gender), mean)
     )


If the data sets are very large, then doing merges/selection/aggregation outside of R can be a good idea.

Christophe

On 6/27/07, Robert Wilkins <irishhacker_at_gmail.com> wrote:
>
> In response to those who asked for a better explanation of what the
> Vilno software does, here's a simple example that gives some idea of
> what it does.
>
> LABRESULTS is a dataset with multiple rows per patient , with lab
> sodium measurements. It has columns: PATIENT_ID, VISIT_NUM, and
> SODIUM.
>
> DEMO is a dataset with one row per patient, with demographic data.
> It has columns: PATIENT_ID, GENDER.
>
> Here's a simple example, the following paragraph of code is a
> data processing function (dpf) :
>
>
> inlist LABRESULTS DEMO ;
> mergeby PATIENT_ID ;
> if (SODIUM == -9) SODIUM = NULL ;
> if (VISIT_NUM != 2) deleterow ;
> select AVERAGE_SODIUM = avg(SODIUM) by GENDER ;
> sendoff(RESULTS_DATASET) GENDER AVERAGE_SODIUM ;
> turnoff; // just means end-of-paragraph , version 1.0 won't need this.
>
> Can you guess what it does? The lab result rows are merged with the
> demographic rows, just to get the gender information merged in.
> Obviously, they are merged by patient. The code -9 is used to denote
> "missing", so convert that to NULL. I'm about to take a statistic for
> visit 2, so rows with visit 0 or 1 must be deleted. I'm assuming, for
> visit 2, each patient has at most one row. Now, for each sex group,
> take the average sodium level. After the select statement, I have just
> two rows, for male and female, with the average sodium level in the
> AVERAGE_SODIUM column. Now the sendoff statement just stores the
> current data table into a datafile, called RESULTS_DATASET.
>
> So you have a sequence of data tables, each calculation reading in the
> current table , and leaving a new data table for the next calculation.
>
> So you have input datasets, a bunch of intermediate calculations, and
> one or more output datasets. Pretty simple idea.
>
> *****************************************
>
> Some caveats:
>
> LABRESULTS and DEMO are binary datasets. The asciitobinary and
> binarytoascii statements are used to convert between binary datasets
> and comma-separated ascii data files. (You can use any delimiter:
> comma, vertical bar , etc). An asciitobinary statement is typically
> just two lines of code.
>
> The dpf begins with the inlist statement , and , for the moment ,
> needs "turnoff ;" as the last line. Version 1.0 won't require the use
> of "turnoff;", but version 0.85 does. It only means this paragraph of
> code ends here ( a program can , of course , contain many paragraphs:
> data processing functions, print statements, asciitobinary statements,
> etc.).
>
> If you've worked with lab data, you know lab data does not look so
> simplistic. I need a simple example.
>
> Vilno has a lot of functionality, many-to-many joins, adding columns,
> firstrow() and lastrow() flags, and so forth. A fair amount of complex
> data manipulations have already been tested with test programs ( in
> the tarball ). Of course a simple example cannot show you that, it's
> just a small taste.
>
> *********************************************
>
> If you've never used SPSS or SAS before, you won't care, but this
> programming language falls in the same family as the SPSS and SAS
> programming languages. All three programming languages have a fair
> amount in common, but are quite different from the S programming
> language. The vilno data processing function can replace the SAS
> datastep. (It can also replace PROC TRANSPOSE and much of PROC MEANS,
> except standard deviation calculations still need to be included in
> the select statement).
>
> ********************************************
>
> I hope that helps.
>
> http://code.google.com/p/vilno
>
> ______________________________________________
> R-help_at_stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Christophe Pallier (http://www.pallier.org)

	[[alternative HTML version deleted]]

______________________________________________
R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 27 Jun 2007 - 08:40:22 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 27 Jun 2007 - 12:32:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.