Re: [R] Data Manipulations - Group By equivalent

From: Wensui Liu <liuwensui_at_gmail.com>
Date: Sun 02 Jul 2006 - 12:39:08 EST

Zubin,

I bet you are working for intercontinental hotels and think you probably are not the real Zubin there. right? ^_^. If you have chance, could you please say hi to him for me?

Here is a piece of R code I copy from my blog side by side with SAS. You might need to tweak it a little to get what you need.

 CALCULATE GROUP SUMMARY IN R

##################################################

# HOW TO CALCULATE GROUP SUMMARY IN R #
# DATE : DEC-13, 2005 #
##################################################

# EQUIVALENT SAS CODE: #
# #
# DATA DATA; #
# DO I = 1 TO 2; #
# DO J = 1 TO 4; #
# GROUP = 'TREATMENT_'||PUT(I, 1.); #
# X = RANNOR(1); #
# OUTPUT; #
# END; #
# END; #
# KEEP GROUP X; #
# RUN; #
# #
# PROC SQL; #
# CREATE TABLE COMBINE AS #
# SELECT *, MEAN(X) AS MEAN_X, SUM(X) AS SUM_X #
# FROM DATA #
# GROUP BY GROUP; #
# QUIT; #
##################################################


# GENERATE A TREATMENT GROUP #

group<-as.factor(paste("treatment", rep(1:2, 4), sep = '_'));

# CREATE A SERIES OF RANDOM VALUES #

x<-rnorm(length(group));

# CREATE A DATA FRAME TO COMBINE THE ABOVE TWO #
data<-data.frame(group, x);

# CALCULATE SUMMARY FOR X #

x.mean<-tapply(data$x, data$group, mean, na.rm = T); x.sum<-tapply(data$x, data$group, sum, na.rm = T);

# CREATE A DATA FRAME TO COMBINE SUMMARIES #
summ<-data.frame(x.mean, x.sum, group = names(x.mean));

# COMBINE DATA AND SUMMARIES TOGETHER #
combine<-merge(data, summ, by = "group");

On 7/1/06, zubin <binabina@bellsouth.net> wrote:

>
> Hello, a beginner R user - boy i wish there was a book on just data
> manipulations for SAS users learning R (equivalent to the SAS DATA
> STEP)..  Okay, my question:
>
> I have a panel data set, hotel data occupancy by month for 12 months,
> 1000 hotels.  I have a field labeled 'year' and want to consolidate the
> monthly records using an average into 1000 occupancy numbers - just a
> simple average of the 12 months by hotel.  In SQL this operation is
> pretty easy, a group by query (group by hotel where year = 2005, avg
> occupancy) - how is this done in R? (in R language not SQL).  Thx!
>
> -zubin
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>



-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

	[[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Sun Jul 02 12:46:07 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 03 Jul 2006 - 16:14:35 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.