[R] Help with possible bug (assigning NA value to data.frame)?

From: Dan Bolser <dmb_at_mrc-dunn.cam.ac.uk>
Date: Wed 08 Jun 2005 - 04:15:22 EST

This 'strange behaviour' manifest itself within some quite complex code. When I created a *very* simple example the behaviour dissapeared.

Here is the simplest version I have found which still causes the strange behaviour (it could be quite unrelated to the boot library, however).

library(boot)  

## boot statistic function
my.mean.s <- function(data,subset){
  mean(data[subset])
}

## dummy data, deliberatly no variance
my.test.dat.1 <- rep(4,5)
my.test.dat.2 <- rep(8,5)

## not much can happen here
my.test.boot.1 <- boot( my.test.dat.1, my.mean.s, R=10 ) my.test.boot.2 <- boot( my.test.dat.2, my.mean.s, R=10 )

## returns a null object as ci is meaningless for this data my.test.boot.ci.1 <- boot.ci(my.test.boot.1,type='normal') my.test.boot.ci.2 <- boot.ci(my.test.boot.2,type='normal')

## now try to store this data (the problem begins)...

## dummy existing data
a <- data.frame(matrix(c(1,2,3,4),nrow=2))

## make space for new data
a$X3 <- NA
a$X4 <- NA

## try to store the upper and lower ci (not) calculated above

a[a$X1==1,]$X3 <-  my.test.boot.ci.1$normal[2]
a[a$X1==1,]$X4 <-  my.test.boot.ci.1$normal[3]
a[a$X1==2,]$X3 <-  my.test.boot.ci.1$normal[2]
a[a$X1==2,]$X4 <-  my.test.boot.ci.1$normal[3]

a

What I see is

> a
  X1 X2 X3 X4
1 1 3 NA 1
2 2 4 NA 2

What I expected to see was

> a
  X1 X2 X3 X4
1 1 3 NA NA
2 2 4 NA NA

Some how the last assignment of the data from within the null object assigns the value of the '==x' part of the logical vector subscript.

If I make the following (trivial?) adjustment

a[a$X1==1,]$X4 <-  my.test.boot.ci.1$normal[3]
a[a$X1==1,]$X3 <-  my.test.boot.ci.a$normal[2]
a[a$X1==2,]$X4 <-  my.test.boot.ci.1$normal[3]
a[a$X1==2,]$X3 <-  my.test.boot.ci.1$normal[2]


The output changes to

> a
  X1 X2 X3 X4
1 1 3 1 1
2 2 4 2 2

Which is even wronger.

Not sure if this is usefull without the full context, but here is the output from the real version of this program (where most of the above code is within a loop). What is printed out for each cycle of the loop is the value of the '==x' part of the subscript.

[1] 2
[1] 3
[1] 4
[1] 5
[1] "All values of t are equal to 1 \n Cannot calculate confidence
intervals"
[1] 6
[1] 7
[1] "All values of t are equal to 1 \n Cannot calculate confidence
intervals"
[1] 8
[1] 10
[1] 11
[1] "All values of t are equal to 1 \n Cannot calculate confidence
intervals"
>

Above you see that for some values I can't calculate a ci (but storing it as above), then...

> dat.5.ho

  CHAINS DOM_PER_CHAIN     lower     upper
1      2      1.416539 1.3626253  1.468387
2      3      1.200000 1.1146014  1.288724
3      4      1.363636 1.2675657  1.462571
4      5      1.000000        NA  5.000000
5      6      1.323529 1.0991974  1.546156
6      7      1.000000        NA  7.000000
7      8      1.100000 0.9037904  1.289210
8     10      1.142857 0.8775104  1.403918
9     11      1.000000        NA 11.000000
>

Do you spot the same problem? Namely for each value of the 'CHAINS' column that was unable to calculate a ci, the second assignment to the data table from the 'null' object assigned the lookup value of CHAINS to that column instead! The assignment (within the loop) looks like this...

  dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <- x.s.ci$normal[2]   dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <- x.s.ci$normal[3]

(where chain is the 'loop variable').

As far as I can tell this is a bug. It dosn't happen when I try...  

  dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <- NA   dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <- NA

And doing the following (swapping the order) changes the behaviour...

  dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <- x.s.ci$normal[3]   dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <- x.s.ci$normal[2]   

Giving...

> dat.5.ho

  CHAINS DOM_PER_CHAIN      lower     upper
1      2      1.416539  1.3616070  1.472716
2      3      1.200000  1.1134237  1.287601
3      4      1.363636  1.2587204  1.466037
4      5      1.000000  5.0000000  5.000000
5      6      1.323529  1.1082482  1.547222
6      7      1.000000  7.0000000  7.000000
7      8      1.100000  0.9021282  1.287672
8     10      1.142857  0.8766731  1.403327
9     11      1.000000 11.0000000 11.000000


Which is again incorrect and unpredicted (as above).

Please let me know what to do to report this problem better, or if I just missed something silly.

I am RH9, R-2.1.0 (compiled from source), latest boot from CRAN (if that makes a difference).

Cheers,
Dan.



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Jun 08 04:22:19 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:27 EST