Date: Thu, 17 Apr 2008 10:23:00 -0400

Thank you very much, Dr. Ripley. The solution "ifelse()" you provided is exactly what I want. I am so happy this morning for that I recieved your email. Yesterday night I was trying to write a loop to substitute NA. But now I learn that "ifelse()" does a much more efficient work. Really appreciate your help!

Jiang

**> > I was doing a genetic project with two datasets X and Y. There are
**> > some IDs in both data sets, and others in either data set. I used
**> > "merge(x,y,by="ID",all=TRUE)". The data set Y contains a variable (a
**> > genotype) which is also in data X. When I merge X with Y, these two
**> > variables were automatically re-named by appending .x and .y to the
**> > original variable names. As you can see on the following list, I would
**> > like to take whatever available (non-missing non-NA) in X or Y as the
**> > final value for the genotype S3Allel1. I used paste() function.
**> > However, it converts <NA> to NA as character. Would you please tell me
**> > how I can just get the genotype without pasting the NA to it? I
**> > checked the document of paste() and noticed that it used
**> > as.character() to the vector argument. I guess that is the reason I
**> > got "NA" as a string for the new variable I created (S3Allele1).
**> Please don't 'guess': that is not what as.character does.
**>
**> Your example is not reproducible (see the footer of this message) and it is
**> not clear what the structure is. But <NA> indicates a missing value in a
**> factor or unquoted character vector. E.g.
**> > x <- c("G", "A", "A")
**> > y <- rep(NA_character_, 3)
**> > data.frame(x, y)
**> >
**> x y
**> 1 G <NA>
**> 2 A <NA>
**> 3 A <NA>
**> > paste(x, y)
**> >
**> [1] "G NA" "A NA" "A NA"
**> Here y does contain missing values and paste() converted them to "NA".
**> As the help says:
**>
**> Note that 'paste()' coerces 'NA_character_', the character missing
**> value, to '"NA"' which may seem undesirable, e.g., when pasting
**> two character vectors, or very desirable, e.g. in 'paste("the
**> value of p is ", p)'.
**>
**> Possibly you want
**>
**> ifelse(is.na(x), y, x)
**>
**> > Should I use any other funtion to avoid this problem? Any insight is
**> > appreciated!
**> >
**> > ID S3Allele1.x S3Allele1.y S3Allele1
**> > 1 10003 G <NA> G NA
**> > 2 10004 A <NA> A NA
**> > 3 10005 A <NA> A NA
**> > 4 10006 A <NA> A NA
**> > 5 10007 G <NA> G NA
**> > 6 10008 A <NA> A NA
**> > 7 10009 A <NA> A NA
**> > 8 10010 A <NA> A NA
**> > 9 10011 A <NA> A NA
**> > 10 10013 A <NA> A NA
**> > 11 10014 A <NA> A NA
**> > 12 10015 A <NA> A NA
**> > 13 10016 A <NA> A NA
**> > 14 10017 A <NA> A NA
**> > 15 10018 A <NA> A NA
**> > 16 10019 G <NA> G NA
**> > 17 10020 A <NA> A NA
**> > 18 10021 G <NA> G NA
**> > 19 10022 A <NA> A NA
**> > 20 10023 G <NA> G NA
**> > 21 10024 G <NA> G NA
**> > 22 10025 G <NA> G NA
**> > 23 10027 G <NA> G NA
**> > 24 10028 G <NA> G NA
**> > 25 10029 G <NA> G NA
**> > 26 10031 G <NA> G NA
**> > 27 10032 A <NA> A NA
**> > 28 10033 <NA> NA
**> > 29 10035 A <NA> A NA
**> > 30 10037 A <NA> A NA
**> > 31 10038 <NA> A NA A
**> > 32 10039 <NA> A NA A
**> Brian D. Ripley, ripley_at_stats.ox.ac.uk
**> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
**> University of Oxford, Tel: +44 1865 272861 (self)
**> 1 South Parks Road, +44 1865 272866 (PA)
**> Oxford OX1 3TG, UK Fax: +44 1865 272595
