Re: [R] Subassignments involving NAs in data frames

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Fri 10 Jun 2005 - 06:09:53 EST

On Thu, 9 Jun 2005, Thomas Lumley wrote:

> On Thu, 9 Jun 2005, McGehee, Robert wrote:
>
>> I'm seeing some inconsistent behavior when re-assigning values in a data
>> frame. The first assignment turns all of the 0s in my data frame to 2s,
>> the second fails to do so.

But they differ in several ways, so why is this labelled `inconsistent'? Why not ask `what is the difference'?

The answer to the pertinent question is `the number of items to be replaced'.

>>> df1 <- data.frame(a = c(NA, 0, 3, 4))
>>> df2 <- data.frame(a = c(NA, 0, 0, 4))
>>> df1[df1 == 0] <- 2 ## Works
>>> df2[df2 == 0] <- 2
>> Error: NAs are not allowed in subscripted assignments
>
> Hmm. This looks like a bug to me.
>
>> Checking an old news file I see this:
>> o Subassignments involving NAs and with a replacement value of
>> length > 1 are now disallowed. (They were handled
>> inconsistently in R < 2.0.0, see PR#7210.) For data frames
>> they are disallowed altogether, even for logical matrix indices
>> (the only case which used to work).
>>
>> which leaves me to believe that the assignment for both df1 and df2
>> should fail ("data frame ... disallowed altogether"), however that seems
>> not to be the case, since the example works for df1.
>
> Yes, I think the bug is that it works

It has since been allowed in a few cases to avoid needlessly breaking existing code. (The curse of back-compatibility.)

In the first example there is only one value to be replaced, so there is no ambiguity in the meaning. In the second the replacement has to be replicated to the needed length and so the rules for vectors give the error message.

Another case which is allowed is if none of the values are to be replaced: that is all the logical indices are FALSE or NA.

>> Also, the
>> vectorized version works as expected (because the replacement value has
>> a length of 1).
>>
>>> vec1 <- c(NA, 0, 3, 4)
>>> vec2 <- c(NA, 0, 0, 4)
>>> vec1[vec1 == 0] <- 2 ## Works
>>> vec2[vec2 == 0] <- 2 ## Also works
>
> I'm not sure that this is supposed to work, either, but it might be.

Reading help("[") should help alleviate your uncertainty, for this is explicitly documented there.

>> Is this behavior for data frames intentional? What's the best
>> alternative to df1[df1 == 0] <- 2 that doesn't fail in situations such
>> as df2? A simple loop over columns?
>
> df2[df2 %in% 0] is the recommended method.

That index is a logical vector of length one. Try

ind <- df2 == 0
df2[ind & !is.na(ind)] <- 2

but this is really just a loop over columns implemented in [<-.data.frame.

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Fri Jun 10 06:15:51 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:28 EST