Re: [Rd] slow load() in R2.6.0

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Thu, 11 Oct 2007 05:27:22 +0100 (BST)

On Thu, 11 Oct 2007, Mark.Bravington_at_csiro.au wrote:

> I'm encountering excruciatingly slow load times for character vectors in
> R 2.6.0-- up to 30sec for a 15K file that contains a no-attributes
> character vector of length ~1e4 and object size ~0.5MB. In R 2.5.1,
> repeated loads of the same set of files are near-instantaneous.
>
> The problem is proving tricky to reproduce consistently from scratch, so

> I have attached the 3 files used in the examples below.

There was no attachment: since these are (I presume) binary files, can you not put them on a website (as suggested by the posting guide)?

> If I create a similar-looking object from scratch, then save it and
> re-load it a few times, the problem doesn't always occur... at least not
> in that session.
>
>
> FWIW I have noticed that the time taken to load seems to be roughly a
> power of 2 of the "base slow load time"-- could be a red herring.
>
> The problem seems specific to character vectors-- I noticed it with
> entire workspaces and have whittled it down to char vecs only.
>
> The example below is from a brand-new session with only the basic
> packages loaded; delays in my real sessions are much longer.

Can you please try R-patched or R-devel. We've found and solved a couple of performance issues with creating STRSXPs, but with character vectors of the millions of elements.

I tried several examples of around 10000 elements and got times of at most 0.05 secs in 2.6.0. These included parts of those examples on which we had seen performance issues.

A few clues:

>
>
> Mark Bravington
> CSIRO Mathematical & Information Sciences
> Marine Laboratory
> Castray Esplanade
> Hobart 7001
> TAS
>
> ph (+61) 3 6232 5118
> fax (+61) 3 6232 5012
> mob (+61) 438 315 623
>
>
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>
>> system.time( load( 'd:/r2.0/t1.rda'))
> user system elapsed
> 0.5 0.0 0.5
>> system.time( load( 'd:/r2.0/t1.rda')) # same file; slower
> user system elapsed
> 3.5 0.0 3.5
>> system.time( load( 'd:/r2.0/t1.rda'))
> user system elapsed
> 4.13 0.00 4.13
>> system.time( load( 'd:/r2.0/t1.rda'))
> user system elapsed
> 3.51 0.00 3.52
>
>> system.time( load( 'd:/r2.0/t2.rda')) # different bigger file
> user system elapsed
> 4.42 0.00 4.42
>> system.time( load( 'd:/r2.0/t2.rda')) # same file; slower
> user system elapsed
> 10.44 0.00 10.44
>> system.time( load( 'd:/r2.0/t2.rda'))
> user system elapsed
> 10.79 0.00 10.80
>> system.time( load( 'd:/r2.0/t2.rda'))
> user system elapsed
> 10.39 0.00 10.41
>> system.time( load( 'd:/r2.0/t1.rda')) # the smaller file again; slower
> user system elapsed
> 10.67 0.00 10.69
>> system.time( load( 'd:/r2.0/t3.rda')) # different smaller file
> user system elapsed
> 10.51 0.00 10.52
>> system.time( load( 'd:/r2.0/t2.rda')) # now bigger file again: slower
> user system elapsed
> 14.61 0.00 14.61
>
>
>
> --please do not edit the information below--
>
> Version:
> platform = i386-pc-mingw32
> arch = i386
> os = mingw32
> system = i386, mingw32
> status =
> major = 2
> minor = 6.0
> year = 2007
> month = 10
> day = 03
> svn rev = 43063
> language = R
> version.string = R version 2.6.0 (2007-10-03)
>
> Windows XP (build 2600) Service Pack 2.0
>
> Locale:
> LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;LC_MON

> ETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australia.1252
>
> Search Path:
> Search Path:
> .GlobalEnv, package:stats, package:graphics, package:grDevices,
> package:utils, package:datasets, package:methods, Autoloads,
> package:base
>

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Thu 11 Oct 2007 - 04:29:29 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 25 Oct 2007 - 11:37:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.