Re: [R] recoding data with loops

From: Donald Braman <donald.braman_at_gmail.com>
Date: Mon, 19 May 2008 20:34:16 -0400

Many, many thanks Erik! For anyone who is searching around looking for a way to recode in R, here's the full code Erik provided:

var_list <- c("HEQUAL", "EWEALTH", "ERADEQ", "HREVDIS1", "EDISCRIM", "HREVDIS2") ## my original list of variables mdf <- data.frame(replicate(length(var_list), sample(7,100, replace = TRUE))) ## generate 100 records of random numbers sampled from 1:7 names(mdf) ## unnecessary, but helpful to see what R supplies as default names
names(mdf) <- var_list ## substitues my variable names mdf ## lovely!

reverse_me_varnames <- c("HEQUAL", "HREVDIS1", "HREVDIS2") ## these are the variables I want to reverse code
reversed_varnames <-paste("R", reverse_me_varnames, sep = "") ## this generates the names of the reversed variables by taking on an "R"

mdf[reversed_varnames] <-

    lapply(mdf[reverse_me_varnames],

        function(x) recode(x, recodes = "5:7=NA; 1=4; 2=3; 3=2; 4=1;",
            as.factor.result = FALSE))  ## this applies the recode function
to all the variable I want to recode and stores them in the new "R___" variables.
mdf ## lovely!

I really like that R doesn't even need to use loops to do this -- seems very efficient to me!

On Mon, May 19, 2008 at 6:49 PM, Erik Iverson <iverson_at_biostat.wisc.edu> wrote:

> Got it, I did not know of the 'recode' function in car.
>
> So you would like to recode those specific columns then? Once again, we
> can do it without a loop, this time with the help of a function called
> lapply, which applies a function to each item in a list in turn.
>
> Try:
>
> reverse_me_varnames <- c("HEQUAL", "HREVDIS1", "HREVDIS2")
> reversed_varnames <-paste("R", reverse_me_varnames, sep = "")
>
> ## See ?paste
>
> mdf[reversed_varnames] <-
> lapply(mdf[reverse_me_varnames],
> function(x) recode(x, recodes = "5:7=NA; 1=4; 2=3; 3=2; 4=1;",
> as.factor.result = FALSE))
>
> Now what does this actually mean? To the left of '<-' is simply the new
> columns of our data.frame. We want to then use lapply to do some function
> to a list of objects. The first argument to lapply is that list. In this
> case, it is simply the columns of the data.frame you want reversed. A
> data.frame is a list in R. See ?list and ?data.frame. Then, the next
> argument to lapply is a function that we want to perform on each element in
> our list. So, we create a function that accepts as input a variable I
> simply call 'x'. This 'x' is going to be an item from the list we passed
> lapply, which is one of the columns of mdf in 'reverse_me_varnames'.
>
> We then use the recode function in the car package to recode x, in a
> similar way to what you tried before. This function of x we define will get
> called three times in the above example, once for each of
> reverse_me_varnames. It will then assign those three new columns to the
> left-hand side of the <- operator, which are three newly-named columns.
>
> To see why what you tried before did not work, with the for loop, try:
>
> mdf$HEQUAL
>
> contrasted with
>
> t1 <- c("HEQUAL")
> mdf$t1
>
> From the help for ?Extract, $ does not allow 'computed' indices.
>
> I hope this helps!
>
> Erik
>
>
> Donald Braman wrote:
>
>> Erik,
>>
>> Your example was just what I needed to generate the data -- many, many
>> thanks! The names() function was something I had not grasped fully. I now
>> have this and it works very nicely:
>>
>> var_list <- c("HEQUAL", "EWEALTH", "ERADEQ", "HREVDIS1", "EDISCRIM",
>> "HREVDIS2")
>> mdf <- data.frame(replicate(length(var_list), sample(7,100, replace =
>> TRUE))) ## generate random data
>> names(mdf) ## default names
>> names(mdf) <- var_list ## use our names
>> mdf
>>
>> I'm still trying to figure out how to recode (using the car package) data
>> into new variables using a similar loop. Basically, I'm not sure how to call
>> the variable name and append it to the dataframe name in a loop. In Stata
>> I'd do this using single quotes, but clearly that's not how R works. I
>> tried several variations on this:
>>
>> reverse_me_varnames <- c("HEQUAL", "HREVDIS1", "HREVDIS2")
>> reversed_varnames <- c("RHEQUAL", "RHREVDIS1", "RHREVDIS2")
>> for(i in 1:length(reverse_me_varnames))
>> {mdf$reversed_varnames[i] <- recode(mdf$reverse_me_varnames[i], '5:7=NA;
>> 1=4; 2=3; 3=2; 4=1;', as.factor.result=FALSE)
>>
>> While I don't get an error message, the data don't change. Any advice on
>> reverse coding non-continguous variables?
>>
>>
>>
>> On Mon, May 19, 2008 at 4:12 PM, Donald Braman <donald.braman_at_gmail.com<mailto:
>> donald.braman_at_gmail.com>> wrote:
>>
>> Many thanks --
>>
>> You are right; I had rnorm() and sample() mixed up in my code. I'll
>> work on generating a normal ordinal sample next.
>>
>> Cheers, Don
>>
>>
>> On Mon, May 19, 2008 at 4:07 PM, Erik Iverson
>> <iverson_at_biostat.wisc.edu <mailto:iverson_at_biostat.wisc.edu>> wrote:
>>
>> Hello -
>>
>>
>> Donald Braman wrote:
>>
>> # I'm new to R and am trying to get the hang of how it handles
>> # dataframes & loops. If anyone can help me with some simple
>> tasks,
>> # I'd be much obliged.
>>
>> # First, i'd like to generate some random data in a dataframe
>> # to efficiently illustrate what I'm up to.
>> # let's say I have six variables as listed below (I really
>> # have hundreds, but a few will illustrate the point).
>> # I want to generate my dataframe (mdf)
>> # with the 6 variables X 100 values with rnorm(7).
>> # How do I do this? I tried many variations on the following:
>>
>> var_list <- c("HEQUAL", "EWEALTH", "ERADEQ", "HREVDIS1",
>> "EDISCRIM",
>> "HREVDIS2")
>> for(i in 1:length(var_list)) {var_list[1] <- rnorm(100)}
>> mdf <- data.frame(cbind(varlist[1:length(var_list)])
>> mdf
>>
>> There are many ways to do this. Do you mean that you want 6
>> columns, 100 observations in each column, each a sample from a
>> normal distribution with mean = 7 and sd = 1? You can do this
>> without looping in one of several ways. If you are coming from
>> a SAS environment (my guess since you talk of looping over
>> data.frames), you may be used to looping through a data object.
>> In R, you can usually avoid this since many functions are
>> vectorized, or take a 'whole object' approach.
>>
>>
>> var_list <- c("HEQUAL", "EWEALTH", "ERADEQ", "HREVDIS1",
>> "EDISCRIM", "HREVDIS2")
>>
>> mdf <- data.frame(replicate(6, rnorm(100, 7))) ## generate
>> random data
>> names(mdf) ## default names
>> names(mdf) <- var_list ## use our names
>>
>>
>>
>> # Then, I'd like to recode the variables that begin with the
>> letter "H".
>> # I've tried many variations of the following, but to no avail:
>>
>> reverse_list <- c("HEQUAL", "HREVDIS1", "HREVDIS2")
>> reversed_list <- c("RHEQUAL", "RHREVDIS1", "RHREVDIS2")
>> for(i in 1:length(reverse_list))
>> {mdf[ ,e_reversed_list][[i]] <- recode(mdf[
>> ,e_reverse_list][[i]],
>> '5:99=NA; 1=4; 2=3; 3=2; 4=1; ', as.factor.result=FALSE)
>>
>>
>> I'm not quite sure what you are after here. What do you mean by
>> recode? What package is your 'recode' function located in?
>>
>> It appears that you may be under the impression that the
>> data.frame contains integers, but certainly it will not since it
>> was generated with rnorm? sample can generate a samples of the
>> type you may be after, for example,
>>
>> > sample(7, 100, replace = TRUE)
>>
>> Best,
>> Erik Iverson
>>
>>
>>
>>
>> -- Donald Braman
>> http://www.law.gwu.edu/Faculty/profile.aspx?id=10123
>> http://research.yale.edu/culturalcognition
>> http://ssrn.com/author=286206
>>
>>
>>
>> --
>> Donald Braman
>> http://www.law.gwu.edu/Faculty/profile.aspx?id=10123
>> http://research.yale.edu/culturalcognition
>> http://ssrn.com/author=286206
>>
>

-- 
Donald Braman
http://www.law.gwu.edu/Faculty/profile.aspx?id=10123
http://research.yale.edu/culturalcognition
http://ssrn.com/author=286206

	[[alternative HTML version deleted]]

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 20 May 2008 - 00:38:08 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 20 May 2008 - 06:30:39 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive