Re: [R] Systematic treatment of missing values

From: David Soloveichik <dsolov_at_caltech.edu>
Date: Tue 30 May 2006 - 18:34:02 EST

Thank you very much for your prompt reply and for adding the comments to the help pages for match and ==. I think the source of my confusion was that by looking at the current documentation (v 2.3.0) I did not realize that matching is different from equality testing. (Obviously in the case of using regular expressions, etc, it is different, but I thought that when using plain "match" and %in%, matching would be determined by ==.)

Also I did not mean for my first comment to sound like a criticism of R for treating NAs inconsistently. Nonetheless I am still curious why the particular choice was made that "match" (and therefore %in%) acts differently from "==" with respect to NA's and NaN's (with the default and the only implemented value of the "incomparables" parameter)?

Thank you,
David

On May 28, 2006, at 1:10 AM, Prof Brian Ripley wrote:

> You start with very general comments, but only use one specific
> function, match (see ?"%in%", a help page entitled `value matching').
>
> Matching and equality are treated differently. By definition, NA
> matches NA and nothing else, and NaN matches NaN and nothing else.
> In comparisons, these values are not comparable.
>
> As you will have seen from the help page, match() has the expansion
> capacity for declaring values non-comparable. That has not been
> implemented for a decade and no one has supplied code to implement
> it, so it seems no want has much need of it.
>
> I have added notes to the help pages for match and == to say
> explicitly what matches and what is comparable. If the *Draft* R
> Language Definition were ever to be finished it would have such
> details: it already has a useful commentary.
>
> On Sat, 27 May 2006, David Soloveichik wrote:
>
>> I am wondering whether there is a well-accepted approach to handling
>> missing values (NA's) in a programming language such as R. For
>> example, most functions seem to propagate NA to the output when the
>> value of the missing entry could have mattered. In other words, most
>> functions are not willing to "take a stand" on what the missing value
>> was. However, some functions don't seem to do this. For example,
>>
>> > c(1,2,3,NA) %in% c(2,3)
>> [1] FALSE TRUE TRUE FALSE
>>
>> rather than: FALSE TRUE TRUE NA
>>
>>
>> Also, what is the logic of the following:
>> > c(1,2,3,NA) %in% c(2,3,NA)
>> [1] FALSE TRUE TRUE TRUE
>>
>> Why is the last output value TRUE? Why does R claim that the NA on
>> the left hand side of %in% is the same as the NA on the right hand
>> side of %in%?
>
> It does not: it reports that it *matches*. Please do read the help
> page bwofre posting, as the posting guide asked you to.
>
>> PLEASE do read the posting guide! http://www.R-project.org/posting-
>> guide.html
>
> --
> Brian D. Ripley, ripley@stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue May 30 18:52:13 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 30 May 2006 - 20:10:20 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.