Re: [R] NA problem when use paste function

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Thu, 17 Apr 2008 06:38:04 +0100 (BST)

On Wed, 16 Apr 2008, Lu, Jiang wrote:

> Dear R helpers,
>
> I was doing a genetic project with two datasets X and Y. There are
> some IDs in both data sets, and others in either data set. I used
> "merge(x,y,by="ID",all=TRUE)". The data set Y contains a variable (a
> genotype) which is also in data X. When I merge X with Y, these two
> variables were automatically re-named by appending .x and .y to the
> original variable names. As you can see on the following list, I would
> like to take whatever available (non-missing non-NA) in X or Y as the
> final value for the genotype S3Allel1. I used paste() function.
> However, it converts <NA> to NA as character. Would you please tell me
> how I can just get the genotype without pasting the NA to it? I
> checked the document of paste() and noticed that it used
> as.character() to the vector argument. I guess that is the reason I
> got "NA" as a string for the new variable I created (S3Allele1).

Please don't 'guess': that is not what as.character does.

Your example is not reproducible (see the footer of this message) and it is not clear what the structure is. But <NA> indicates a missing value in a factor or unquoted character vector. E.g.

> x <- c("G", "A", "A")
> y <- rep(NA_character_, 3)
> data.frame(x, y)

   x y

1 G <NA>
2 A <NA>
3 A <NA>

> paste(x, y)

[1] "G NA" "A NA" "A NA"

Here y does contain missing values and paste() converted them to "NA". As the help says:

      Note that 'paste()' coerces 'NA_character_', the character missing
      value, to '"NA"' which may seem undesirable, e.g., when pasting
      two character vectors, or very desirable, e.g. in 'paste("the
      value of p is ", p)'.

Possibly you want

ifelse(is.na(x), y, x)

> Should I use any other funtion to avoid this problem? Any insight is
> appreciated!
>
> ID S3Allele1.x S3Allele1.y S3Allele1
> 1 10003 G <NA> G NA
> 2 10004 A <NA> A NA
> 3 10005 A <NA> A NA
> 4 10006 A <NA> A NA
> 5 10007 G <NA> G NA
> 6 10008 A <NA> A NA
> 7 10009 A <NA> A NA
> 8 10010 A <NA> A NA
> 9 10011 A <NA> A NA
> 10 10013 A <NA> A NA
> 11 10014 A <NA> A NA
> 12 10015 A <NA> A NA
> 13 10016 A <NA> A NA
> 14 10017 A <NA> A NA
> 15 10018 A <NA> A NA
> 16 10019 G <NA> G NA
> 17 10020 A <NA> A NA
> 18 10021 G <NA> G NA
> 19 10022 A <NA> A NA
> 20 10023 G <NA> G NA
> 21 10024 G <NA> G NA
> 22 10025 G <NA> G NA
> 23 10027 G <NA> G NA
> 24 10028 G <NA> G NA
> 25 10029 G <NA> G NA
> 26 10031 G <NA> G NA
> 27 10032 A <NA> A NA
> 28 10033 <NA> NA
> 29 10035 A <NA> A NA
> 30 10037 A <NA> A NA
> 31 10038 <NA> A NA A
> 32 10039 <NA> A NA A
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 17 Apr 2008 - 06:32:14 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 17 Apr 2008 - 15:00:29 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive