Re: [R] stuk at another point: simple question

From: Dennis Murphy <djmuser_at_gmail.com>
Date: Tue, 01 Mar 2011 01:04:32 -0800

Hi:

This is *really* ugly, but given the number of variables you have in mind, it seems rather necessary, at least to me.

# Given an original data frame dataf, find the max ID number:
Mvars <- names(dataf)[grep('^M', names(dataf))]
# Pick out the number of variables that start with M:
n <- max(as.numeric(sapply(strsplit(Mvars, ''), function(x) unlist(x)[2])))

# Generate all the new variables, similar to those in the transform()
statement to get df2
# In your example, you showed the new variables for one pair - this extends
to all pairs,
# repeated up to the largest parent number
newvars <- c(paste('m', rep(1:max(c(Parent1, Parent2), na.rm = TRUE), each = n),

                        rep(c('a', 'b')), rep(c('p1', 'p2'), each = 2), sep
= ''))

# Generate a two column matrix of the M*a and M*b names
v <- matrix(names(dataf)[-(1:4)], ncol = 2, byrow = TRUE)

# cbind v with itself, transpose the result and collapse it to a vector
vnames <- as.vector(t(cbind(v, v)))

# vnames and newvars should have the same length

# Generate a character string of the generate the set of calls,
# extending the call you used to get the new variables in df2
nvarx <- paste(newvars, '=', vnames, '[', rep(c('Parent1', 'Parent2'), each = 2), ']', sep = '')
nvarx.str <- paste(nvarx, collapse = ', ')
# Generate the text string that comprises the call to transform()
txtexp <- paste('transform(dataf, ', nvarx.str, ')', sep = '')

# Parse the text string in txtemp and evaluate it to get the extended data
frame df3
df3 <- eval(parse(text = txtexp))

# Next, we work on producing the averages. To do this, we first generate a
matrix of names,
# then write a function for each row of the matrix that
# * produces text strings that select the proper names for each of the
calls in hP1, hP2, t1 and t2
# * evaluates the parsed strings
# * produces the average

# Matrix of names for each parent

namesmat <- matrix(c(paste('M', 1:n, 'a', sep = ''), paste('M', 1:n, 'b', sep = ''),

                     paste('m', 1:n, 'ap1', sep = ''), paste('m', 1:n,
'bp1', sep = ''),
                     paste('m', 1:n, 'ap2', sep = ''), paste('m', 1:n,
'bp2', sep = '')),
                     nrow = n)

# Function to apply to each row of the above matrix
findavg <- function(x) {

   # x is an input string of variable names
   # we construct the calls as strings and then evaluate them
   # final output is the average

   x <- as.vector(x)
   hP1 <- eval(parse(text = 'as.numeric(df3[x[3]] != df3[x[4]])'))
   hP2 <- eval(parse(text ='as.numeric(df3[x[5]] != df3[x[6]])'))
    t1 <- eval(parse(text ='as.numeric(df3[x[1]] != df3[x[3]])'))
    t2 <- eval(parse(text ='as.numeric(df3[x[2]] != df3[x[5]])'))
     C <- (hP1*(t1-0.25)+ hP2 *(t2-0.25))
    yv <- df3$y

    mean(C * yv, na.rm = TRUE)
  }

# Apply it to the matrix of names:

apply(namesmat, 1, findavg)

# For the example data given,

[1] -1.166667 6.500000 -1.166667 2.916667

Please double check this on your example below to make sure it's doing the right thing - I didn't check whether or not the averages were right.

After all that eval(parse(text = *))ing, I need a shower...I feel dirty :) If there's a better way, I'd love to see it.

HTH,
Dennis

On Mon, Feb 28, 2011 at 5:00 AM, Umesh Rosyara <rosyaraur_at_gmail.com> wrote:

>  Dear R-community members.
>
> I am really appreciate R-help group. Dennis has been extrremely helpful to
> solve some of my questions. I am following Dennis recommendation in the
> following email, yet I am stuck at another point (hope this will took me to
> end of this project.
>
> Ind <- c(1:5)
> Parent1 <- c(NA,NA,1,1,3)
> Parent2 <- c(NA,NA,2,2,4)
> y <- c(6,5,8,10,7)
> M1a <- c(1,2,1,1,1)
> M1b <- c(1,2,2,2,1)
> M2a <- c(3,3,1,1,3)
> M2b <- c(1,1,3,3,3)
> M3a <- c(4,4,4,4,4)
> M3b <- c(4,4,1,1,4)
> M4a <- c(1,4,4,1,4)
> M4b <- c(4,4,4,4,4)
>
> dataf <- data.frame (Ind, Parent1, Parent2, y, M1a, M1b,M2a,M2b,
> M3a,M3b,M4a, M4b) # I have more than >1000 variables pair
>
> # pair1 (M1a,M1b) pair2 (M2a, M2b), pair3 (M3a, M3b)...
>
> df2 <- transform(dataf,m1ap1 = dataf$M1a[dataf$Parent1],
>                        m1bp1 = dataf$M1b[dataf$Parent1],
>                        m1ap2 = dataf$M1a[dataf$Parent2],
>                        m1bp2 = dataf$M1b[dataf$Parent2])
> # downstream calculations
>  hP1 <- ifelse(df2$m1ap1==df2$m1bp1,0,1)
>  hP2 <- ifelse(df2$m1bp2==df2$m1bp2,0,1)
>  t1 <- ifelse(df2$M1a==df2$m1ap1,1,0)
>  t2 <- ifelse(df2$M1b==df2$m1ap2,1,0)
>  C <- (hP1*(t1-0.25)+ hP2 *(t2-0.25))
>  yv <- df2$y
>  Cy <- C*yv
>  avgCy <- mean(Cy, na.rm=T)
>  avgCy # I want to store this value to new dataframe with first model i.e.
>
>
> How can I loop the process to output the second pair( here M2a, M2b), third
> pair (here M3a, M3b) to all pairs (I have more than 1000)
>
> Mode1  avgCy
> 1       1.75  # from pair M1a and M1b
> 2             # from pair M2a and M2b
> 3             # from pair M3a and M3b
> 4             # from pair M4a and M4b
>
> to the end of the file
>
> Thank you in advance
>
> Umesh R
>
>  ------------------------------
> *From:* Dennis Murphy [mailto:djmuser_at_gmail.com]
> *Sent:* Friday, February 18, 2011 12:28 AM
> *To:* Umesh Rosyara
> *Cc:* r-help_at_r-project.org
> *Subject:* Re: [R] recoding a data in different way: please help
>
> Hi:
>
> This is as far as I could get:
>
> df <- read.table(textConnection("
>  Individual      Parent1  Parent2         mark1   mark2
>  1        0       0       12      11
>  2        0       0       11      22
>  3        0       0       13      22
>  4        0       0       13      11
>  5        1       2       11      12
>  6        1       2       12      12
>  7        3       4       11      12
>  8        3       4       13      12
>  9        1       4       11      12
>  10       1       4       11      12"), header = TRUE)
> df2 <- transform(df, Parent1 = replace(Parent1, Parent1 == 0, NA),
>                  Parent2 = replace(Parent2, Parent2 == 0, NA))
> df2 <- transform(df2, imark1p1 = df2$mark1[df2$Parent1],       # Parent 1's
> mark1
>                       imark1p2 = df2$mark1[df2$Parent2],
> # Parent 2's mark1
>                       imark2p1 = df2$mark2[df2$Parent1],
> # Parent 1's mark2
>                       imark2p2 = df2$mark2[df2$Parent2])
> # Parent 2's mark2
>
> I created df2 so as not to overwrite the original in case of a mistake. At
> this point, you have several sets of vectors that you can compare; e.g.,
> mark1 with imark1p1 and imark1p2. Like Josh, I couldn't make heads or tails
> out of what these logical tests were meant to output, but perhaps this gives
> you a broader template with which to work. At this point, you can probably
> remove the rows corresponding to the parents. I believe ifelse() is your
> friend here - it can perform logical tests in a vectorized fashion. As long
> as the tests are consistent from one individual to the next, it's likely to
> be an efficient route.
>
> HTH,
> Dennis
>
> On Thu, Feb 17, 2011 at 6:21 PM, Umesh Rosyara <rosyaraur_at_gmail.com>wrote:
>
>> Dear R users
>>
>> The following question looks simple but I have spend alot of time to solve
>> it. I would highly appeciate your help.
>>
>> I have following dataset from family dataset :
>>
>> Here we have individuals and their two parents and their marker scores
>> (marker1, marker2,....and so on). 0 means that their parent information
>> not
>> available.
>>
>>
>> Individual      Parent1  Parent2         mark1   mark2
>> 1        0       0       12      11
>> 2        0       0       11      22
>> 3        0       0       13      22
>> 4        0       0       13      11
>> 5        1       2       11      12
>> 6        1       2       12      12
>> 7        3       4       11      12
>> 8        3       4       13      12
>> 9        1       4       11      12
>> 10       1       4       11      12
>>
>> I want to recode mark1 and other mark2.....and so on column by looking
>> indvidual parent (Parent1 and Parent2).
>>
>> For example
>>
>> Take case of Individual 5, who's Parent 1 is 1 (has mark1 score 12) and
>> Parent 2 is 2 (has mark1 score 11). Individual 5 has mark1 score 11.
>> Suppose
>> I have following condition to recode Individual 5's mark1 score:
>>
>> For mark1 variable, If Parent1 score "11" and Parent2 score "22" and
>> recode
>> indvidual 5's score, "12"=1, else 0
>>                                    If Parent1 score "12" and Parent2 score
>> "22" and recode individual 5's score, "22"=1, "12"= 0.5, else 0
>>                                    .........................more
>> conditions
>>
>> Similarly the pointer should move from individual 5 to n individuals at
>> the
>> end of the file.
>>
>>  Thank you in advance
>>
>> Umesh R
>>
>>
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ------------------------------
>
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1204 / Virus Database: 1435/3449 - Release Date: 02/17/11
>

	[[alternative HTML version deleted]]

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 01 Mar 2011 - 09:08:10 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 01 Mar 2011 - 10:00:16 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive