Re: [R] efficiently replacing values in a matrix

From: Charles C. Berry <cberry_at_tajo.ucsd.edu>
Date: Wed, 16 Apr 2008 14:33:00 -0700

On Thu, 17 Apr 2008, Rolf Turner wrote:

>
> On 17/04/2008, at 7:52 AM, Matthew Keller wrote:
>
>> Hello all,
>>
>> I should probably know this by now... Anyway:
>>
>> I have a large matrix (dim(data) is 3000 18000). In each element are
>> one of the following character strings "0/0", "1/1", "1/2", "2/2". I
>> wanted to replace "0/0" with NA and the other three with 0,1,2
>> respectively. To accomplish just the first of these four steps I did
>> this:
>>
>> data[data=="0/0"] <- NA
>>
>> Which is still running after 13 hours. I have 18 GB RAM and running 64
>> bit R. What is a more efficient way to accomplish this (I've already
>> done it using sed in UNIX - but want to know how to do so in R)?
>> Thanks in advance.
>
> Well I just did
>
> gorp <- c("0/0","1/1","1/2","2/2")
> mung <- matrix(sample(gorp,54e6,TRUE),3000,18000)
> mung[mung=="0/0"] <- NA
>
> and the whole schmear ran in under half a minute of real time.

Likewise.

I'll lay odds that Matthew's 'matrix' is actually a data.frame, and I'll not be surprised if the columns are factors. In which case

 	mung2 <- as.data.frame(lapply( mung,
 			function(x) {
 				levels(x)[ levels(x)=='0/0' ] <- NA
 				x } ))

will be faster, but still not as fast as what you show with a matrix.

HTH, Chuck

>
> > sessionInfo()
> R version 2.6.2 (2008-02-08)
> i386-apple-darwin8.10.1
>
> locale:
> C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] misc_0.0-2
>
> loaded via a namespace (and not attached):
> [1] rcompgen_0.1-17
>
> I would say that something is seriously snarled up in your system.
>
> cheers,
>
> Rolf Turner
>
> ######################################################################
> Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry_at_tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 16 Apr 2008 - 21:38:02 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 16 Apr 2008 - 22:30:29 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive