Re: [R] vectorization

From: Rau, Roland <Rau_at_demogr.mpg.de>
Date: Sat 18 Jun 2005 - 04:53:08 EST


Hi,

> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Dimitri Joe
> Sent: Friday, June 17, 2005 7:01 PM
> To: R-Help
> Subject: [R] vectorization
>
> Hi there,
>
> I have a data frame (mydata) with 1 numeric variable (income)
> and 1 factor (education). I want a new column in this data
> with the median income for each education level. A obviously
> inneficient way to do this is
>
I guess the attached code (incl. simulating your data structure) is not the most efficient way to do this, but at least (I hope so!) it does what you wanted it to do:

####################### Beginning of Example Code

income <- runif(100)
education <- as.factor(sample(c("high", "middle", "low"), size=length(income), replace=TRUE))
mydata <- data.frame(inc=income, edu=education)

mymedians <- tapply(X=mydata$inc, INDEX=mydata$edu, FUN=median)

mydata$medians <- ifelse(mydata$edu=="high", mymedians["high"], 0)
mydata$medians <- ifelse(mydata$edu=="middle", mymedians["middle"],
mydata$medians)
mydata$medians <- ifelse(mydata$edu=="low", mymedians["low"],
mydata$medians)

head(mydata)
mymedians

####################### End of Example Code

Maybe one can increase the speed, but I think it is sufficient for your case of 30,000 cases as you can see from the timing on my desktop computer here (WinXP Pro SP2, P4, 3GHz, 512MB RAM):

> time.check <- function(){

+ income <- runif(30000)
+ education <- as.factor(sample(c("high", "middle", "low"), size=length(income), replace=TRUE))

+   mydata <- data.frame(inc=income, edu=education)
+   
+   mymedians <- tapply(X=mydata$inc, INDEX=mydata$edu, FUN=median)
+ 
+   mydata$medians <- ifelse(mydata$edu=="high", mymedians["high"], 0)
+   mydata$medians <- ifelse(mydata$edu=="middle", mymedians["middle"],
mydata$medians)
+ mydata$medians <- ifelse(mydata$edu=="low", mymedians["low"], mydata$medians)
+ return(NULL)
+ }
> system.time(time.check())
[1] 0.36 0.02 0.38 NA NA
>
> version

         _
platform i386-pc-mingw32

arch     i386           
os       mingw32        
system   i386, mingw32  
status   beta           
major    2              
minor    1.0            
year     2005           
month    04             
day      04             
language R              


Best,
Roland

+++++
This mail has been sent through the MPI for Demographic Rese...{{dropped}}



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Jun 18 05:02:33 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:49 EST