Re: [R] Need a more efficient way to implement this type of logic in R

From: Joshua Wiley <jwiley.psych_at_gmail.com>
Date: Wed, 06 Apr 2011 13:49:40 -0700

Hi Walter,

Take a look at the function ?cut. It is designed to take a continuous variable and categorize it, and will be much simpler and faster. The only qualification is that your data would need to be numeric, not character. However, if your only values are the ones you put in quotes in your code ('02' etc), a simple call to as.numeric(variablename) ought to do the trick. Beyond being faster, you can probably get down to one line of code, which should be much easier on the eyes. To see some examples with cut(), type (at the console):

example(cut)

Hope this helps,

Josh

P.S. If you are planning on doing any modelling with this data, why not leave it continuous?

On Wed, Apr 6, 2011 at 1:02 PM, Walter Anderson <wandrson01_at_gmail.com> wrote:
>  I have cobbled together the following logic.  It works but is very slow.
>  I'm sure that there must be a better r-specific way to implement this kind
> of thing, but have been unable to find/understand one.  Any help would be
> appreciated.
>
> hh.sub <- households[c("HOUSEID","HHFAMINC")]
> for (indx in 1:length(hh.sub$HOUSEID)) {
>  if ((hh.sub$HHFAMINC[indx] == '01') | (hh.sub$HHFAMINC[indx] == '02') |
> (hh.sub$HHFAMINC[indx] == '03') | (hh.sub$HHFAMINC[indx] == '04') |
> (hh.sub$HHFAMINC[indx] == '05'))
>    hh.sub$CS_FAMINC[indx] <- 1 # Less than $25,000
>  if ((hh.sub$HHFAMINC[indx] == '06') | (hh.sub$HHFAMINC[indx] == '07') |
> (hh.sub$HHFAMINC[indx] == '08') | (hh.sub$HHFAMINC[indx] == '09') |
> (hh.sub$HHFAMINC[indx] == '10'))
>    hh.sub$CS_FAMINC[indx] <- 2 # $25,000 to $50,000
>  if ((hh.sub$HHFAMINC[indx] == '11') | (hh.sub$HHFAMINC[indx] == '12') |
> (hh.sub$HHFAMINC[indx] == '13') | (hh.sub$HHFAMINC[indx] == '14') |
> (hh.sub$HHFAMINC[indx] == '15'))
>    hh.sub$CS_FAMINC[indx] <- 3 # $50,000 to $75,000
>  if ((hh.sub$HHFAMINC[indx] == '16') | (hh.sub$HHFAMINC[indx] == '17'))
>    hh.sub$CS_FAMINC[indx] <- 4 # $75,000 to $100,000
>  if ((hh.sub$HHFAMINC[indx] == '18'))
>    hh.sub$CS_FAMINC[indx] <- 5 # More than $100,000
>  if ((hh.sub$HHFAMINC[indx] == '-7') | (hh.sub$HHFAMINC[indx] == '-8') |
> (hh.sub$HHFAMINC[indx] == '-9'))
>    hh.sub$CS_FAMINC[indx] = 0
> }
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 06 Apr 2011 - 20:57:31 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 06 Apr 2011 - 21:00:27 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive