Re: [R] Format integer

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Tue, 13 May 2008 07:12:39 +0100 (BST)

This is one of those problems where the fine details matter.

  1. The version of R. I optimized sprintf() for long inputs and a single format in R 2.7.0 -- the differences are mainly for multiple inputs and where coercion is needed. See also below.
  2. The system. My home system with an Intel Core 2 Duo is usually about the same speed as my office desktop with dual Opterons. But not here:

Home:

> system.time(a<-formatC(x,digits=10,flag='0'))

    user system elapsed
   9.705 0.088 9.810
> system.time(b<-sprintf("%011d",x))

    user system elapsed
   0.283 0.000 0.283

Office:

> system.time(a<-formatC(x,digits=10,flag='0'))

    user system elapsed
  15.851 0.125 16.007
> system.time(b<-sprintf("%011d",x))

    user system elapsed
   0.816 0.001 0.818

and my Windows laptop is similar to the second here. So a speed-up of 95x seems atypical.

On Mon, 12 May 2008, Phil Spector wrote:

> I guess "little" means different things to different people:
>
>> x = sample(1:100,650000,replace=TRUE)
>> system.time(a<-formatC(x,digits=10,flag='0'))
> user system elapsed
> 32.854 0.444 34.813
>> system.time(b<-sprintf("%011d",x))
> user system elapsed
> 0.352 0.012 0.363
>
> If you look at the definitions of the functions, you'll see
> that formatC is written in R, and sprintf uses a single call
> to an .Internal function. I

Not really: the meat of formatC() is a .C call. In this case it is calling format.default(), also a .Internal. But profiling shows that most of the time here is spent in paste(), another function which was optimized in 2.7.0. (I see 2.7.0 as 1.7x faster than 2.6.2 on formatC here.)

But although sprintf is more flexible, on most problems it will be substantially faster.

> - Phil Spector
> Statistical Computing Facility
> Department of Statistics
> UC Berkeley
> spector_at_stat.berkeley.edu
>
>
>
> On Mon, 12 May 2008, Anh Tran wrote:
>
>> Yea, thanks all. I checked back and I got a few things mistyped.
>> The array is 650,000 and it took 25 seconds :p. It's acceptable. Just that
>> I
>> had too many variable at the time I ran it.
>>
>> Also, seems like sprintf is a little faster.
>>
>> Thanks all.
>>
>> Anh Tran
>>
>>
>> On Mon, May 12, 2008 at 2:55 PM, Uwe Ligges
>> <ligges_at_statistik.tu-dortmund.de>
>> wrote:
>>
>>>
>>>
>>> Anh Tran wrote:
>>>
>>>> Thanks. formatC(flag) works.
>>>>
>>>> But it's awefully slow. I try to do that for 65000 numbers (generating
>>>> ID
>>>> for each item) and it seems like forever.
>>>>
>>>
>>> On my not that recent laptop:
>>>
>>>> system.time(formatC(1:65000, width=10, flag="0"))
>>> user system elapsed
>>> 1.92 0.00 1.94
>>>
>>>
>>> I think 2 seconds is less than "forever".
>>>
>>> Uwe Ligges
>>>
>>>
>>>
>>>
>>>
>>>
>>> Is there any faster way?
>>>>
>>>> Thank all.
>>>>
>>>> Anh Tran
>>>>
>>>> On Mon, May 12, 2008 at 2:36 PM, Uwe Ligges <
>>>> ligges_at_statistik.uni-dortmund.de> wrote:
>>>>
>>>>
>>>>> Anh Tran wrote:
>>>>>
>>>>> Hi,
>>>>>> What's one way to convert an integer to a string with preceding 0's?
>>>>>> such that
>>>>>> '13' becomes '00000000013'
>>>>>> to be put into a string
>>>>>>
>>>>>> I've tried formatC, but they removes all the zeros and replace it
>>>>>> with
>>>>>> blanks
>>>>>>
>>>>>> Not so for me:
>>>>>
>>>>> formatC(13, digits=10, flag="0")
>>>>>
>>>>> Uwe LIgges
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
>> --
>> Regards,
>> Anh Tran
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 13 May 2008 - 06:35:26 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 13 May 2008 - 07:30:40 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive