Re: [Rd] allocMatrix error

From: Martin Morgan <mtmorgan_at_fhcrc.org>
Date: Tue, 17 Feb 2009 05:46:08 -0800

Prof Brian Ripley <ripley_at_stats.ox.ac.uk> writes:

> On Tue, 17 Feb 2009, Hamid Ashafi wrote:
>
>> On Sat, Feb 14, 2009 at 00:17, <ashrafi_at_ucdavis.edu> wrote:
>>
>> Hi,
>>
>> I was trying to read ~400 chips in an affybatch and I got the same message.
>> Could you find a remedy for that. My server has 128 GB of RAM. However, R
>> halted ever before it uses the memory.
>
> We don't have anything like sufficient details (please do read the
> posting guide).
>
> If the issue is the size of matrices, you possibly (depending on the
> compiler) could arrange to compile R (and any relevant system
> libraries) to use 64-bit ints. For C code in R there is typedef to
> change, and you would need integer*8 in the Fortran. We would be
> interested to know the results if you do so, but the developers are
> unlikely to do so for you.
>
> In any case, since you mention 'affybatch' it looks like this might be
> a design issue in that BioC package and the BioC lists might be the
> appropriate place to discuss it. It is not obvious to me why ~400
> datasets need a single large R object rather than, say, a list of 400
> smaller ones, if that is indeed the problem. So, to return to my
> first point:
>
>> We don't have anything like sufficient details.
>
> Please give us the full details of your system, the memory in use (see
> ?gc) and what you were trying to do.
>
>
>> I have been able to load upto 250 CEL files but this time I wanted to test
>> what would happen if I want to normalize 400 chips.
>
> R can handle up to 16GB objects, which even for a 64-bit OS and 128GB
> of RAM are pretty large objects and do not arise naturally from many
> small files.

Hamid -- Prof. Ripley is correct in pointing you toward the Bioconductor mailing list

  http://bioconductor.org/docs/mailList.html

The usual solution for very large sets of array is to use packages like aroma.affymetrix or xps that do not put the objects entirely in memory, or the AffyPara package to divide large jobs into smaller ones that are processed in parallel. Also of course to think about whether it is statistically reasonable to normalize across all arrays. There are discussions of this topic on the Bioc mailing list, so look in the archive for additional hints.

Martin

>> Thanks for your prompt response.
>>
>>
>>
>> Hamid
>>
>>>
>>
>>>
>>
>>>
>>
>>> Martin Maechler wrote:
>>
>>>>
>>
>>>>>>>>> "VK" == Vadim Kutsyy <vadim_at_kutsyy.com>
>>
>>>>>>>>> on Fri, 01 Aug 2008 07:35:01 -0700 writes:
>>
>>>>
>>
>>>> VK> Martin Maechler wrote:
>>
>>>> >>
>>
>>>> VK> The problem is in array.c, where allocMatrix check for
>>
>>>> VK> "if ((double)nrow * (double)ncol > INT_MAX)". But why
>>
>>>> VK> itn is used and not long int for indexing? (max int is
>>
>>>> VK> 2147483647, max long int is 9223372036854775807)
>>
>>>> >>
>>
>>>> >> Well, Brian gave you all info:
>>
>>>> >>
>>
>>>> VK> exactly, and given that most modern system used for
>>
>>>> VK> computations (i.e. 64bit system) have long int which is
>>
>>>> VK> much larger than int, I am wondering why long int is not
>>
>>>> VK> used for indexing (I don't think that 4 bit vs 8 bit
>>
>>>> VK> storage is an issue).
>>
>>>> >> Did you really carefully read ?Memory-limits ??
>>
>>>> >>
>>
>>>> VK> Yes, it is specify that 4 bit int is used for indexing
>>
>>>> VK> in all version of R, but why? I think 2147483647
>>
>>>> VK> elements for a single vector is OK, but not as total
>>
>>>> VK> number of elements for the matrix. I am running out of
>>
>>>> VK> indexing at mere 10% memory consumption.
>>
>>>>
>>
>>>> Hmm, do you have 160 GBytes of RAM?
>>
>>>> But anyway, let's move this topic from R-help to R-devel.
>>
>>>>
>>
>>>> [...........]
>>
>>>>
>>
>>>> VK> PS: I have no problem to go and modify C code, but I am
>>
>>>> VK> just wondering what are the reasons for having such
>>
>>>> VK> limitation.
>>
>>>>
>>
>>>> This limitation and its possible remedies are an interesting topic,
>>
>>>> but really not for R-help:
>>
>>>>
>>
>>>> It will be a lot about C programming the internal represenation of R
>>
>>>> objects, etc.
>>
>>>> Very fascinating .... but for R-devel.
>>
>>>>
>>
>>>> "See you there!"
>>
>>>> Martin
>>
>>>>
>>
>>>> ______________________________________________
>>
>>>> R-help_at_r-project.org mailing list
>>
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>
>>>> PLEASE do read the posting guide
>>
>>>> http://www.R-project.org/posting-guide.html
>>
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>>>
>>
>>>>
>>
>>> Quoted from:
>>
>>> http://www.nabble.com/allocMatrix-limits-tp18763791p18776531.html
>>
>>>
>>
>>>
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> --
> Brian D. Ripley, ripley_at_stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Tue 17 Feb 2009 - 13:58:53 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 17 Feb 2009 - 16:30:39 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive