Re: [Rd] Date vs date (long)

From: Peter Dalgaard <>
Date: Tue, 18 Sep 2007 00:29:58 +0200

Terry Therneau wrote:
> b. "I'd advise against numeric operation on difftime objects in general,
> because of the unspecified units."
> If I carry this idea forward, the R should insist that I specify units for
> any variable that corresponds to a physical quantity, e.g. "height" or
> "weight", so that it can slap my hands with an error message when I type

> bodyMassIndex = weight/ height^2
> or cause plot(height^2, weight) to fail. This would go a long way towards
> making R the most frustrating program available. (An Microsoft gives some
> stiff competition in that area!)
That's not the point. The point is that 2 weeks is 14 days, so do you want sqrt(2) or sqrt(14)? It is not my design to have this variable-units encoding of difftimes, but as it is there, it is better to play along than to pretend that it is something else. (Once you go to faster time scales than in epidemiology, this becomes quite crucial because the units chosen can depend on the actual differences computed!)

> c.
> "It is assumed that the divisor is unit-less.
> Convert to numeric first to avoid this. (The idea has been raised to
> introduce new units: epiyears and epimonths, in which case you might do
> x <- as.Date('2007-9-14') - as.Date('1953-3-10')
> units(x) <- "epiyears"
> which would give you the age in years for those purposes where you don't
> care missing the exact birthday by a day or so.)"
> As I said, division is a hard case with no clear answer. The creation of
> other unit schemes is silly --- why in the world would I voluntarily put on
> a straightjacket?
We'll put it on for you...

It makes sense to calculate half a difftime or a 12th or a 100th of a difftime. You were asking the system to magically conclude that a 365.25th of a difftime has a different meaning, a units conversion. This is the sort of thing that humans can discern, but not machines. The design is that you change units by using units(x)<-. Unfortunately the largest regular unit is "weeks", hence the suggestion of "epiyears".

> d.
>>> as.Date('09Sep2007')
>> Error in fromchar(x) : character string is not in a standard unambiguous
> format
> My off-the-cuff suggestion is to make the message honest
> Error in fromchar(x): program is not able to divine the correct format
Heh. Pretty close. Now what is a suitable eufemism for "divine"?
> The problem is not that the format is necessarily wrong or ambiguous, but that
> the program can't guess. (Which is no real fault of the program - such
> a recognition is a hard problem. It's ok to ask me for a format string).
> --
> Hadley Wickham
> "Why not just always use seconds for difftime objects? An attribute
> could control how it was formatted, but would be independent of the
> underlying representation."
> This misses the point.
No. It _is_ the point. The design is that the numeric value of a difftime is nonsensical without knowing the units. This might be different, although as Brian indicated, the choice is deliberate, and some deep thinking was involved.
> -------
> Gabor Grothendieck
> as.Date(10)
> You can define as.Date.numeric in your package and then it will work. zoo
> has done that.
> library(zoo)
> as.Date(10)
> This is also a nice idea. Although adding to a package is possible, it is
> now very hard to take away, given namespaces. That is, I can't define my
> own Math.Date to do away with the creation of timespan objects. Am I
> correct? Is it also true that adding methods is hard if one uses version 4
> classes?
> The rest of Gabor's comments are workarounds for the problem I raised.
> But I don't want to have to wrap "as.numeric" around all of my date
> calculations.
Just get used to it, I'd say.

> -----------
> Brian Ripley
> "It fails by design. Using sqrt() on a measurement that has an arbitrary
> origin would not have been good design."
> Ah, the classic Unix response of "that's not a bug, it's a feature".
> What is interesting is that this is almost precisely the response I
> got when I first argued for a global na.action default. John C (I think)
> replied that, essentially, S SHOULD slap you alonside the head when
> there were missing values. They require careful thought wrt good analysis,
> and allowing a global option was bad design because it would encourage bad
> statistics. The Insightful side of the debate said they didn't dare because
> is might break something. After getting nowhere with talking I finally
> gave up and wrote my own version into the survival code. This leverage
> eventually forced adoption of the idea.
> Not many (any?) people currently set because it is
> a "better design".
> ------------------
> Historically, languages designed for other people to use have been bad: Cobol,
> PL/I, Pascal, Ada, C++. The good languages have been those that were designed
> for their own creators: C, Perl, Smalltalk, Lisp. (Paul Graham)
Each of those that I know had its share of trouble with users who relied on implementation details and protested loudly when their programs stopped working. (In the C case, relying on function arguments being stored consecutively on a stack was a common source of grief when porting programs to SPARC machines, for instance. Or assuming that doubles could be stored starting at arbitrary bytes. Or that pointers could be stored in long integers.)

> Terry Therneau

   O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (                  FAX: (+45) 35327907

______________________________________________ mailing list
Received on Mon 17 Sep 2007 - 22:36:03 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 18 Sep 2007 - 06:41:06 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.