Re: [R] norm package prelim.norm

From: Ted Harding <Ted.Harding_at_nessie.mcc.ac.uk>
Date: Thu 02 Feb 2006 - 12:42:16 EST


On 01-Feb-06 Ted Harding wrote:
> On 01-Feb-06 Elizabeth Lawson wrote:

>> Hey eveyone!  I hope someone can help wiht this question.  I have a
>> matirux of all zeros and ones and I would like to indentify all unique
>> patterns in the rows andthe number of times the pattern occurs.   I
>> changed all zeros to NA tried to use prelim.norm to identify all
>> patterns of missing data in the rows.  I got the message 
>>    
>>   Warning message:
>> NAs introduced by coercion 
>> 
>>   Any ideas of how to get this to work?  Or are there any way to
>> indentify all the unique patterns in a huge matrix? ( 10000 x 71)
>>    
>>   Thanks for any suggestions!!
>>    
>>   Elizabeth Lawson

>
> I think Chuck Celand has pretty well answered it: Don't worry
> about the warning, since I'm pretty sure it is generated when
> prelim.norm is calculating something else (e.g. the covariance
> matrix) and it is not related to generating prelim.norm(X)$r
> which is the list of patterns and the numbers of times they occur.
>
> Best wsihes,
> Ted.

Sorry -- I should have read the detail of your original message more carefully. In short, you have too many columns for prelim.norm to work.

The long answer: prelim.norm analyses the missing data patterns by representing the locations of NAs as integers, where the jth bit in the binary representation of the integer is 1 for an NA, 0 for a non-NA. Hence the representation of the pattern runs out of steam when there are more than a certain number of columns, corresponding to the highest power of 2 that can be represented as an integer in R.

  .Machine$integer.max
  [1] 2147483647

  2^31 -1
  [1] 2147483647

so that prelim.norm can only encode NA-patterns in an R integer for up to 31 columns. More than that, and it will not work properly or at all.

Check:

  X<-matrix(sample(c(0,1),87,replace=TRUE),ncol=29)   Y<-X; Y[Y==0]<-NA
  prelim.norm(Y)$r
  [...] (no warning, 3 rows)

  X<-matrix(sample(c(0,1),90,replace=TRUE),ncol=30)   Y<-X; Y[Y==0]<-NA
  prelim.norm(Y)$r
  [...] (no warning, 3 rows)

  X<-matrix(sample(c(0,1),93,replace=TRUE),ncol=31)   Y<-X; Y[Y==0]<-NA
  prelim.norm(Y)$r
  [...] (no warning, 3 rows)

  X<-matrix(sample(c(0,1),93,replace=TRUE),ncol=32)   Y<-X; Y[Y==0]<-NA
  prelim.norm(Y)$r
  [...] (3 rows, "Warning message: NAs introduced by coercion")

  X<-matrix(sample(c(0,1),93,replace=TRUE),ncol=33)   Y<-X; Y[Y==0]<-NA
  prelim.norm(Y)$r
  [...] (2 rows, "Warning message: NAs introduced by coercion")

  X<-matrix(sample(c(0,1),93,replace=TRUE),ncol=34)   Y<-X; Y[Y==0]<-NA
  prelim.norm(Y)$r
  [...] (1 row, "Warning message: NAs introduced by coercion")

(Try a few of these for yourself; it is very unlikely that you get one 1 or 2 distinct rows when you have 3 rows of 30+ 0s and 1s sampled at random).

A similar issue came up some time ago (I can't locate the thread in the archive at the moment) in vennection with the 'mix' package.

However, you can have as many columns as you like if you use 'unique' to identify the distinct patterns of 0s and 1s, rather than using 'prelim.norm'.

Hoping this helps,
Ted.



E-Mail: (Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861
Date: 02-Feb-06                                       Time: 01:42:13
------------------------------ XFMail ------------------------------

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Feb 02 12:49:38 2006

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:42:16 EST