Re: [R] creating a derived variable in a data frame

From: Greg Snow <>
Date: Fri 21 Oct 2005 - 01:37:12 EST

>>>> "Martin Henry H. Stevens" <> 10/20/05 08:47AM
>Hi Avram-
>How many countries do you have?
>I would do it the following way because it is simple and I don't know

>any better, even if it is absurdly painstaking.
>#Step 1
>mydata$continent <- factor(NA, levels=c("NoAm","Euro"))
>#Steps 2 a-z
>mydata$continent[mydata$country=="US" |
> mydata$country=="CA" |
> mydata$country=="MX" ] <- "NoAm"

A shorter alternative to the above is to use %in% like:

mydata$continent[ mydata$country %in% c("US","CA","MX") ] <- "NoAm"

You could also create a new data frame with 2 columns for the country and
corresponding continent, then merge this with your data (see ?merge).

>#Repeat for all countries and continents.
>On Oct 19, 2005, at 8:09 PM, Avram Aelony wrote:
>> Hello,
>> I have read through the manuals and can't seem to find an answer.
>> I have a categorical, character variable that has hundreds of
>> values. I want to group the existing values of this variable into

>> a new, derived (categorical) variable by applying conditions to the

>> values in the data.
>> For example, suppose I have a data frame with variables: date,

>> country, x, y, and z.
>> x,y,z are numeric and country is a 2-digit character string. I
>> want to create a new derived variable named "continent" that would

>> also exist in the data frame. The Continent variable would have
>> values of "Asia", "Europe", "North America", etc...
>> How would this best be done for a large dataset (>10MB) ?
>> I have tried many variations on following without success (note in

>> a real example I would have a longer list of countries and
>> continent values):
>>> mydata$continent <- mydata[ mydata$country==list
>>> ('US','CA','MX'), ] -> "North America"
>> I have read about factors, but I am not sure how they apply here.
>> Can anyone help me with the syntax? I am sure it is trivial and a

>> common thing to do.
>> The ultimate goal is to compute percentages of x by continent.
>> Thanks for any help in advance.
>> -Avram

Greg Snow, Ph.D.
Statistical Data Center, LDS Hospital
Intermountain Health Care
(801) 408-8111 mailing list PLEASE do read the posting guide! Received on Fri Oct 21 03:13:25 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 19:13:25 EST