Re: [R] creating a derived variable in a data frame

From: Greg Snow <greg.snow_at_ihc.com>
Date: Fri 21 Oct 2005 - 01:37:12 EST


>>>> "Martin Henry H. Stevens" <HStevens@MUOhio.edu> 10/20/05 08:47AM
>>>
>Hi Avram-
>How many countries do you have?
>I would do it the following way because it is simple and I don't know

>any better, even if it is absurdly painstaking.
>
>#Step 1
>mydata$continent <- factor(NA, levels=c("NoAm","Euro"))
>
>#Steps 2 a-z
>mydata$continent[mydata$country=="US" |
> mydata$country=="CA" |
> mydata$country=="MX" ] <- "NoAm"

A shorter alternative to the above is to use %in% like:

mydata$continent[ mydata$country %in% c("US","CA","MX") ] <- "NoAm"

You could also create a new data frame with 2 columns for the country and
corresponding continent, then merge this with your data (see ?merge).

>
>#Repeat for all countries and continents.
>
>Hank
>
>
>On Oct 19, 2005, at 8:09 PM, Avram Aelony wrote:
>
>> Hello,
>>
>> I have read through the manuals and can't seem to find an answer.
>>
>> I have a categorical, character variable that has hundreds of
>> values. I want to group the existing values of this variable into

>> a new, derived (categorical) variable by applying conditions to the

>> values in the data.
>>
>> For example, suppose I have a data frame with variables: date,

>> country, x, y, and z.
>>
>> x,y,z are numeric and country is a 2-digit character string. I
>> want to create a new derived variable named "continent" that would

>> also exist in the data frame. The Continent variable would have
>> values of "Asia", "Europe", "North America", etc...
>>
>> How would this best be done for a large dataset (>10MB) ?
>> I have tried many variations on following without success (note in

>> a real example I would have a longer list of countries and
>> continent values):
>>
>>
>>> mydata$continent <- mydata[ mydata$country==list
>>> ('US','CA','MX'), ] -> "North America"
>>>
>>
>> I have read about factors, but I am not sure how they apply here.
>>
>> Can anyone help me with the syntax? I am sure it is trivial and a

>> common thing to do.
>> The ultimate goal is to compute percentages of x by continent.
>>
>> Thanks for any help in advance.
>>
>> -Avram
>

Greg Snow, Ph.D.
Statistical Data Center, LDS Hospital
Intermountain Health Care
greg.snow@ihc.com
(801) 408-8111



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Oct 21 03:13:25 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 19:13:25 EST