Re: [R] How to erase (replace) certain elements in the data.frame?

From: Joshua Wiley <jwiley.psych_at_gmail.com>
Date: Sun, 24 Apr 2011 07:33:01 -0700

On Sat, Apr 23, 2011 at 11:35 PM, Thomas Levine <thomas.levine_at_gmail.com> wrote:
> This should do the same thing

Did you actually test it? I get very different things.

>
> random.del <- function (x, n.keeprows, del.percent){
>   del<-function(col){
>     col[sample.int(length(col),length(col)*del.percent/100)]<-NA
>     col
>   }
>   change<-n.keeprows:nrow(x)
>   x[change,]<-lapply(x[change,],del)

but a data frame is a list of vectors column wise, while Sergey's function went row by row. However, using sample.int() is a much better idea than what I did with sample().

>   x
> }
>
> This is faster because it's vectorized.

but in such a way that you cannot guarantee the same number of cells are missing from each row. Try:

rowSums(is.na("Mine"))

>
> [1] "Mine"
>   user  system elapsed
>  0.004   0.000   0.002
> [1] "Yours"
>   user  system elapsed
>  1.172   0.020   1.193
>
> Tom
>
> On Sat, Apr 23, 2011 at 8:37 PM, sneaffer <sneaffer@mail.ru> wrote:
>>
>> Hello R-world,
>> Please, help me to get round my little mess
>> I have a data.frame in which I'd rather like some values to be NA for the
>> future imputation process.
>>
>> I've come up with the following piece of code:
>>
>> random.del <- function (x, n.keeprows, del.percent){
>>  n.items <- ncol(x)
>>  k <- n.items*(del.percent/100)
>>  x.del <- x
>>  for (i in (n.keeprows+1):nrow(x)){
>>    j <- sample(1:n.items, k)
>>    x.del[i,j] <- NA
>>  }
>>  return (x.del)
>> }
>>
>> The problems is that random.del turns out to be slow on huge samples.
>> Is there any other more effective/charming way to do the same?
>>
>> Thanks,
>> Sergey
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/How-to-erase-replace-certain-elements-in-the-data-frame-tp3470883p3470883.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sun 24 Apr 2011 - 15:33:28 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 24 Apr 2011 - 16:40:33 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive