Re: [Rd] slow load() in R2.6.0

From: <Mark.Bravington_at_csiro.au>
Date: Thu, 11 Oct 2007 18:36:18 +1100


Problem fixed by R-patched, thanks; see comments below.

>On Thu, 11 Oct 2007, Mark.Bravington@csiro.au wrote:
>
>> I'm encountering excruciatingly slow load times for character vectors

>> in R 2.6.0-- up to 30sec for a 15K file that contains a no-attributes

>> character vector of length ~1e4 and object size ~0.5MB. In R 2.5.1,
>> repeated loads of the same set of files are near-instantaneous.
>>
>> The problem is proving tricky to reproduce consistently from scratch,

>> so I have attached the 3 files used in the examples below.
>
>There was no attachment: since these are (I presume) binary files, can
you
>not put them on a website (as suggested by the posting guide)?

Sorry, I would have if I could, but can't at present. The attachments got through OK to me at least, though. If anyone does have an interest in the files, let me know off-list and I'll re-send as a zip or somesuch.

>
>> If I create a similar-looking object from scratch, then save it and
>> re-load it a few times, the problem doesn't always occur... at least
not
>> in that session.
>>
>>
>> FWIW I have noticed that the time taken to load seems to be roughly a

>> power of 2 of the "base slow load time"-- could be a red herring.
>>
>> The problem seems specific to character vectors-- I noticed it with
>> entire workspaces and have whittled it down to char vecs only.
>>
>> The example below is from a brand-new session with only the basic
>> packages loaded; delays in my real sessions are much longer.
>
>Can you please try R-patched or R-devel. We've found and solved a
couple
>of performance issues with creating STRSXPs, but with character vectors
of
>the millions of elements.

Thanks; R-patched fixed it. I did look in R-devel NEWS before posting, but that doesn't mention the bug fix on CHARSXP which is in the R-patched NEWS, so I didn't persist.

FWIW in case work is still being done on new CHARSXP: my problems were with much shorter vectors (~1e4) than the millions mentioned in patched-NEWS, and the strings were short too: 90% were '' and the other 10% were 'a'. Also, when the previously offending objects are loaded into 2.6.0patched, they are 3-10X smaller (according to object.size) than in unpatched-- I was also amazed by the compression! Looks like unpatched R was allocating at least a 32-byte memory entry per individual zero-character string. It is down to about 4 bytes per (zero-character) string in R-patched.

Mark Bravington

>
>I tried several examples of around 10000 elements and got times of at
most
>0.05 secs in 2.6.0. These included parts of those examples on which we

>had seen performance issues.
>
>A few clues:
>
>- even your base time is much slower than I would expect.
>
>- you say 'a 15K file ... object size ~0.5MB'. That's pretty
phenomenal
> compression, and I am seeing file sizes more like 100Kb for objects
that
> size. Since object.size does take into account duplication, one way
to
> get that would be to have all unique elements. At ca 50bytes per
> element you would need an average string length of about 15 chars.
Such
> an object takes about 200Kb as a .rda file.
>
>
>>
>>
>> Mark Bravington
>> CSIRO Mathematical & Information Sciences
>> Marine Laboratory
>> Castray Esplanade
>> Hobart 7001
>> TAS
>>
>> ph (+61) 3 6232 5118
>> fax (+61) 3 6232 5012
>> mob (+61) 438 315 623
>>
>>
>>
>> Type 'demo()' for some demos, 'help()' for on-line help, or
>> 'help.start()' for an HTML browser interface to help. Type 'q()' to
>> quit R.
>>
>>> system.time( load( 'd:/r2.0/t1.rda'))
>> user system elapsed
>> 0.5 0.0 0.5
>>> system.time( load( 'd:/r2.0/t1.rda')) # same file; slower
>> user system elapsed
>> 3.5 0.0 3.5
>>> system.time( load( 'd:/r2.0/t1.rda'))
>> user system elapsed
>> 4.13 0.00 4.13
>>> system.time( load( 'd:/r2.0/t1.rda'))
>> user system elapsed
>> 3.51 0.00 3.52
>>
>>> system.time( load( 'd:/r2.0/t2.rda')) # different bigger file
>> user system elapsed
>> 4.42 0.00 4.42
>>> system.time( load( 'd:/r2.0/t2.rda')) # same file; slower
>> user system elapsed
>> 10.44 0.00 10.44
>>> system.time( load( 'd:/r2.0/t2.rda'))
>> user system elapsed
>> 10.79 0.00 10.80
>>> system.time( load( 'd:/r2.0/t2.rda'))
>> user system elapsed
>> 10.39 0.00 10.41
>>> system.time( load( 'd:/r2.0/t1.rda')) # the smaller file again;
>>> slower
>> user system elapsed
>> 10.67 0.00 10.69
>>> system.time( load( 'd:/r2.0/t3.rda')) # different smaller file
>> user system elapsed
>> 10.51 0.00 10.52
>>> system.time( load( 'd:/r2.0/t2.rda')) # now bigger file again:
slower
>> user system elapsed
>> 14.61 0.00 14.61
>>
>>
>>
>> --please do not edit the information below--
>>
>> Version:
>> platform = i386-pc-mingw32
>> arch = i386
>> os = mingw32
>> system = i386, mingw32
>> status =
>> major = 2
>> minor = 6.0
>> year = 2007
>> month = 10
>> day = 03
>> svn rev = 43063
>> language = R
>> version.string = R version 2.6.0 (2007-10-03)
>>
>> Windows XP (build 2600) Service Pack 2.0
>>
>> Locale:
>>
LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;LC_M
>> ON
>>

ETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australia.1252
>>
>> Search Path:
>> Search Path:
>> .GlobalEnv, package:stats, package:graphics, package:grDevices,
>> package:utils, package:datasets, package:methods, Autoloads,
>> package:base
>>
>
>--
>Brian D. Ripley, ripley_at_stats.ox.ac.uk
>Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
>University of Oxford, Tel: +44 1865 272861 (self)
>1 South Parks Road, +44 1865 272866 (PA)
>Oxford OX1 3TG, UK Fax: +44 1865 272595
>



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 11 Oct 2007 - 07:44:23 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 25 Oct 2007 - 11:37:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.