Re: [R] Need a more efficient way to implement this type of logic in R

From: Alexander Engelhardt <alex_at_chaotic-neutral.de>
Date: Wed, 06 Apr 2011 23:04:12 +0200

Am 06.04.2011 22:02, schrieb Walter Anderson:
> I have cobbled together the following logic. It works but is very slow.
> I'm sure that there must be a better r-specific way to implement this
> kind of thing, but have been unable to find/understand one. Any help
> would be appreciated.
>
> hh.sub <- households[c("HOUSEID","HHFAMINC")]
> for (indx in 1:length(hh.sub$HOUSEID)) {
> if ((hh.sub$HHFAMINC[indx] == '01') | (hh.sub$HHFAMINC[indx] == '02') |
> (hh.sub$HHFAMINC[indx] == '03') | (hh.sub$HHFAMINC[indx] == '04') |
> (hh.sub$HHFAMINC[indx] == '05'))
> hh.sub$CS_FAMINC[indx] <- 1 # Less than $25,000
> if ((hh.sub$HHFAMINC[indx] == '06') | (hh.sub$HHFAMINC[indx] == '07') |
> (hh.sub$HHFAMINC[indx] == '08') | (hh.sub$HHFAMINC[indx] == '09') |
> (hh.sub$HHFAMINC[indx] == '10'))
> hh.sub$CS_FAMINC[indx] <- 2 # $25,000 to $50,000
> if ((hh.sub$HHFAMINC[indx] == '11') | (hh.sub$HHFAMINC[indx] == '12') |
> (hh.sub$HHFAMINC[indx] == '13') | (hh.sub$HHFAMINC[indx] == '14') |
> (hh.sub$HHFAMINC[indx] == '15'))
> hh.sub$CS_FAMINC[indx] <- 3 # $50,000 to $75,000
> if ((hh.sub$HHFAMINC[indx] == '16') | (hh.sub$HHFAMINC[indx] == '17'))
> hh.sub$CS_FAMINC[indx] <- 4 # $75,000 to $100,000
> if ((hh.sub$HHFAMINC[indx] == '18'))
> hh.sub$CS_FAMINC[indx] <- 5 # More than $100,000
> if ((hh.sub$HHFAMINC[indx] == '-7') | (hh.sub$HHFAMINC[indx] == '-8') |
> (hh.sub$HHFAMINC[indx] == '-9'))
> hh.sub$CS_FAMINC[indx] = 0
> }

Hi,
the for-loop is entirely unnecessary. You can, as a first step, rewrite the code like this:

if ((hh.sub$HHFAMINC == '01') | (hh.sub$HHFAMINC == '02') | (hh.sub$HHFAMINC == '03') | (hh.sub$HHFAMINC == '04') | (hh.sub$HHFAMINC == '05'))

     hh.sub$CS_FAMINC <- 1 # Less than $25,000

This very basic concept is called "vectorization" in R. You should read about it, it rocks.

In this case, though, you don't even need to do that: If you cast the variable HHFAMINC into a number like this: hh.sub$HHFAMINC <- as.numeric(hh.sub$HHFAMINC) , then you can apply the cut() function to create a factor variable:

hh.sub$myawesomefactor <- cut(hh.sub$HHFAMINC, breaks=c(5.5, 10.5, 15.5, 17.5))
or something like that should do the trick. You will then have to rename the factor values. I think it is the function names(), but I'm only 95% sure (heh.)

Also, this might be my OCD speaking, but I would use NA instead of 0 for non-available values.

Have fun,
  Alex



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 06 Apr 2011 - 21:05:13 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 06 Apr 2011 - 21:10:27 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive