Re: [R] missing handling

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Wed 05 Oct 2005 - 05:35:47 EST

On Tue, 4 Oct 2005, Weiwei Shi wrote:

> Hi, Jim:
> I tried your code and get the following error:
> trn1<-read.table('trn1.svm', header=F, na.string='.', sep='|')
> Med<-apply(trn1, 2, median, na.rm=T)
> Ind<-which(is.na(trn1), arr.ind=T)
> trn1[Ind]<-Med[Ind[,'col']]
> Error in "[<-.data.frame"(`*tmp*`, Ind, value = c(1.00802124455,
> 1.00802124455, :
> only logical matrix subscripts are allowed in replacement
>
>
> I cannot figure out why.

Read the help for "[<-.data.frame" to be told the answer.

A data frame (as given by read.table) is not a matrix, as the example presumably was. Indexing whole matrices at once is efficient, but it hides loops for data frames.

You will not do better than looping over columns for a data frame, but you certainly do not need to loop over rows which is very inefficient. Something like

trn2 <- trn1
for(i in names(trn2)) {

     Med <- median(trn2[[i]], na.rm = TRUE)
     trn2[i, is.na(trn2[[i]])] <- Med

}

>
> Thanks for help,
>
> On 9/27/05, jim holtman <jholtman@gmail.com> wrote:
>>
>> Use 'which(...arr.ind=T)'
>> > x.1
>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>> [1,] 6 10 3 4 10 7 9 8 4 10
>> [2,] 8 7 4 7 4 8 3 NA 3 4
>> [3,] 7 7 10 10 3 5 3 2 2 2
>> [4,] 3 4 5 10 10 2 6 9 4 5
>> [5,] 3 5 9 5 6 NA 3 NA 6 7
>> [6,] 9 6 10 5 10 4 2 10 NA 5
>> [7,] 5 2 5 10 3 7 6 4 6 8
>> [8,] 2 6 1 8 9 2 7 8 3 8
>> [9,] 9 1 4 9 8 10 2 NA 1 7
>> [10,] 2 4 8 7 NA 4 3 NA 5 5
>>> x.4
>> [1] 5.5 5.5 5.0 7.5 8.0 5.0 3.0 8.0 4.0 6.0
>>> Med <- apply(x.1, 2, median, na.rm=T) # get median
>>> Ind <- which(is.na(x.1), arr.ind=T) # determine which are NA
>>> x.1[Ind] <- Med[Ind[,'col']] # replace with median
>>> x.1
>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>> [1,] 6 10 3 4 10 7 9 8 4 10
>> [2,] 8 7 4 7 4 8 3 8 3 4
>> [3,] 7 7 10 10 3 5 3 2 2 2
>> [4,] 3 4 5 10 10 2 6 9 4 5
>> [5,] 3 5 9 5 6 5 3 8 6 7
>> [6,] 9 6 10 5 10 4 2 10 4 5
>> [7,] 5 2 5 10 3 7 6 4 6 8
>> [8,] 2 6 1 8 9 2 7 8 3 8
>> [9,] 9 1 4 9 8 10 2 8 1 7
>> [10,] 2 4 8 7 8 4 3 8 5 5
>>>
>>
>>
>> On 9/27/05, Weiwei Shi <helprhelp@gmail.com> wrote:
>>
>>> Hi,
>>> I have the following codes to replace missing using median, assuming
>>> missing
>>> only occurs on continuous variables:
>>>
>>> trn1<-read.table('trn1.fv', header=F, na.string='.', sep='|')
>>>
>>> # median
>>> m.trn1<-sapply(1:ncol(trn1), function(i) median(trn1[,i], na.rm=T))
>>>
>>> #replace
>>> trn2<-trn1
>>> for (each in 1:nrow(trn1)){
>>> index.missing=which(is.na(trn1[each,]))
>>> trn2[each,]<-replace(trn1[each,], index.missing, m.trn1[index.missing])
>>> }
>>>
>>>
>>> Anyone can suggest some ways to improve it since replacing 10 takes 1.5sec:
>>>> system.time(for (each in 1:10){index.missing=which(is.na
>>> (trn1[each,]));
>>> trn2[each,]<-replace(trn1[each,], index.missing, m.trn1[index.missing
>>> ]);})
>>> [1] 1.53 0.00 1.53 0.00 0.00
>>>
>>>
>>> Another general question is
>>> are there some packages in R doing missing handling?
>>>
>>> Thanks,
>>>
>>> --
>>> Weiwei Shi, Ph.D
>>>
>>> "Did you always know?"
>>> "No, I did not. But I believed..."
>>> ---Matrix III
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help@stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide!
>>> http://www.R-project.org/posting-guide.html
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 247 0281
>>
>> What the problem you are trying to solve?
>
>
>
>
> --
> Weiwei Shi, Ph.D
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Wed Oct 05 05:44:55 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:40:35 EST