Re: [Rd] Date vs date (long)

From: Terry Therneau <therneau_at_mayo.edu>
Date: Mon, 17 Sep 2007 16:11:54 -0500 (CDT)


Peter et al

  Thanks for the comments on dates. Some of the respondents missed the point, by showing ways that I could work around the problems, when my main argument is that one shouldn't have to work around problems. So I hereto present round 2 of the debate.

 1 Postulates

  1. In my 35 year computing experience, I think that nothing frustrates me more than a computer program that tries to keep me from doing something
    "for my own protection", when I know quite well what I am doing. So postulate
    1 is a Bayesian sort of thing: the loss function is so large (hopping mad user) that one should be very cautious about creating a taboo.
  2. The S language's primary success is as a tool. Tools get used in ways that the originator never thought of. Alternate use is not wrong --- in fact you want to foster it. (My farm backround plays a role here. You wouldn't believe the number of things I've fixed with a hammer and/or wrench, when the goal was not to get it done "right", but just to get whatever done and get the crop in.)

 2 Key question

  Both a data and a time-span object consist of a numeric value along with ancillary information about how to interpret that value. For simplicity call the latter "attributes" (ignoring whether they are implemented using the attr function or slots or whatever).
  For some operations is is fairly clear what do to with both the attr and the numeric part, e.g., date + 1 is the next day. No problem here.

  For other operations, e.g., timespan^2 it is only clear that the result is no longer a timespan, but not what class it should be. I firmly believe that the right result is to toss the attribute and return the number. This makes the tool optimally useful. Returning an error message is an unneccesary and controlling response: what good did the "not legal" message do me? There are of course many cases where an error message is the only choice, because I can't see what to do with either the number or the attribute, e.g. date + string.

  The key question is then "what is the right philosphy", flexible tool or rigorous control? Rigorous control languages have not fared well historically.

3 Hard cases

  The hardest are cases where the right return value is unclear. An example is (date + 1.73) : should one return a true date, which is integer, allow an invalid internal value that is "fixed" at print time, return a numeric, or an error message?

   I put (timespan/constant) in this category. The author has no hint as to whether the constant is unitless or not. In the medical research environment converstions back and forth from days to months and years are very common, greatly outmassing division of an iterval into pieces, so if I had to guess I would assume that I had to drop the units; another environment might be just the opposite.

4 Response to particular points:

Peter D, 9/14

  1. as.Date(x) Peter suggests (as.Date('1960-1-1') + x). This is a really good idea, as it makes the code both origin independent and clearer.
  2. "I'd advise against numeric operation on difftime objects in general, because of the unspecified units." If I carry this idea forward, the R should insist that I specify units for any variable that corresponds to a physical quantity, e.g. "height" or
    "weight", so that it can slap my hands with an error message when I type

        bodyMassIndex = weight/ height^2

or cause plot(height^2, weight) to fail. This would go a long way towards making R the most frustrating program available. (An Microsoft gives some stiff competition in that area!)

 c.
"It is assumed that the divisor is unit-less.
Convert to numeric first to avoid this. (The idea has been raised to introduce new units: epiyears and epimonths, in which case you might do

x <- as.Date('2007-9-14') - as.Date('1953-3-10') units(x) <- "epiyears"

which would give you the age in years for those purposes where you don't care missing the exact birthday by a day or so.)"

   As I said, division is a hard case with no clear answer. The creation of other unit schemes is silly --- why in the world would I voluntarily put on a straightjacket?

d.

>> as.Date('09Sep2007')
>>     
> Error in fromchar(x) : character string is not in a standard unambiguous 
format

  My off-the-cuff suggestion is to make the message honest

        Error in fromchar(x): program is not able to divine the correct format

The problem is not that the format is necessarily wrong or ambiguous, but that the program can't guess. (Which is no real fault of the program - such a recognition is a hard problem. It's ok to ask me for a format string).

--
Hadley Wickham

"Why not just always use seconds for difftime objects? An attribute
could control how it was formatted, but would be independent of the underlying representation." This misses the point. ------- Gabor Grothendieck as.Date(10) You can define as.Date.numeric in your package and then it will work. zoo has done that. library(zoo) as.Date(10) This is also a nice idea. Although adding to a package is possible, it is now very hard to take away, given namespaces. That is, I can't define my own Math.Date to do away with the creation of timespan objects. Am I correct? Is it also true that adding methods is hard if one uses version 4 classes? The rest of Gabor's comments are workarounds for the problem I raised. But I don't want to have to wrap "as.numeric" around all of my date calculations. ----------- Brian Ripley
"It fails by design. Using sqrt() on a measurement that has an arbitrary
origin would not have been good design." Ah, the classic Unix response of "that's not a bug, it's a feature". What is interesting is that this is almost precisely the response I got when I first argued for a global na.action default. John C (I think) replied that, essentially, S SHOULD slap you alonside the head when there were missing values. They require careful thought wrt good analysis, and allowing a global option was bad design because it would encourage bad statistics. The Insightful side of the debate said they didn't dare because is might break something. After getting nowhere with talking I finally gave up and wrote my own version into the survival code. This leverage eventually forced adoption of the idea. Not many (any?) people currently set na.action=na.fail because it is a "better design". ------------------ Historically, languages designed for other people to use have been bad: Cobol, PL/I, Pascal, Ada, C++. The good languages have been those that were designed for their own creators: C, Perl, Smalltalk, Lisp. (Paul Graham) Terry Therneau ______________________________________________ R-devel_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri 21 Sep 2007 - 12:33:09 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 21 Sep 2007 - 14:41:09 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.