Re: [Rd] Surprising length() of POSIXlt vector (PR#14073)

From: <maechler_at_stat.math.ethz.ch>
Date: Mon, 30 Nov 2009 14:10:45 +0100 (CET)


>>>>> Tony Plate <tplate_at_acm.org>
>>>>> on Sun, 22 Nov 2009 10:21:33 -0600 writes:

    > maechler_at_stat.math.ethz.ch wrote:
    >>>>>>> "PD" == Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
    >>>>>>> on Fri, 20 Nov 2009 09:54:34 +0100 writes:
    >>>>>>> 

>>

    PD> mark_at_celos.net wrote:
>> >> Arrays of POSIXlt dates always return a length of 9. This
>> >> is correct (they're really lists of vectors of seconds,
>> >> hours, and so forth), but other methods disguise them as
>> >> flat vectors, giving superficially surprising behaviour:
>> >>
>> >> strings <- paste('2009-1-', 1:31, sep='')
>> >> dates <- strptime(strings, format="%Y-%m-%d")
>> >>
>> >> print(dates)
>> >> # [1] "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-04" "2009-01-05"
>> >> # [6] "2009-01-06" "2009-01-07" "2009-01-08" "2009-01-09" "2009-01-10"
>> >> # [11] "2009-01-11" "2009-01-12" "2009-01-13" "2009-01-14" "2009-01-15"
>> >> # [16] "2009-01-16" "2009-01-17" "2009-01-18" "2009-01-19" "2009-01-20"
>> >> # [21] "2009-01-21" "2009-01-22" "2009-01-23" "2009-01-24" "2009-01-25"
>> >> # [26] "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29" "2009-01-30"
>> >> # [31] "2009-01-31"
>> >>
>> >> print(length(dates))
>> >> # [1] 9
>> >>
>> >> str(dates)
>> >> # POSIXlt[1:9], format: "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-04" ...
>> >>
>> >> print(dates[20])
>> >> # [1] "2009-01-20"
>> >>
>> >> print(length(dates[20]))
>> >> # [1] 9
>> >>
>> >> I've since realised that POSIXct makes date vectors easier,
>> >> but could we also have something like:
>> >>
>> >> length.POSIXlt <- function(x) { length(x$sec) }
>> >>
>> >> in datetime.R, to avoid breaking functions (like the
>> >> str.POSIXt method) which use length() in this way?
>>
>>
    PD> [You need "wishlist" in the title for this sort of stuff.]
>>
    PD> I'd be wary of this. Just the other day we found that identical() broke 
    PD> on some objects because a package had length() redefined as a class 
    PD> method. I.e. the danger is that something wants to use length() with its 
    PD> original low-level interpretation.

>>
>> Yes, of course.
>> and Romain mentioned str(). Note that we have needed to define
>> a "POSIXt" method for str(), partly just *because* of the
>> current anomaly:
>> As Tony Plate, e.g., has argued, entirely correctly in my view,
>> the anomaly is that length() and "[" are not compatible;
>> and while I think no R language definition says that they should
>> be, I still believe that you need very good reasons for them to
>> be incompatible, as they are for POSIXlt.
>>
>> In the current case, for me the only good reason is backwards
>> compatibility.
>> My personal taste would be to change it and see what happens.
>> I would be willing to clean up after that change within R 'base'
>> and all packages I am coauthoring (quite a few), but of course
>> there are still a thousand more R packages..
>> My strong bet would be that less than 1% would be affected,
>> and my point guess for the percentage affected would be
>> rather in the order of 1/1000.
>>
>> The question is if we (you too!), the R community, are willing to
>> bear the load of cleanup, after such a change which would really
>> *improve* consistency of that small corner of R.
>> For me, as I indicated above, I am willing to bear my share
>> (and actually have got it ready for R-devel)
    > Would be great to see this change!  Surely the right way to do things is 
    > that functions that wish to examine the low level structure of S3 
    > objects should use unclass() before looking at length and elements, so 
    > there's no reason for a class such as POSIXlt to not provide a 
    > logical-level length method.

I have now committed such a change to R-devel (only!), revision 50616. Thank you and Gabor and others for supporting this.

As said here earlier in this thread: We must be ready to see that this change can break other code that implicitly assumed the "old" i.e. pre R-devel (2.11.x) behavior.

As I also said earlier, I'm prepared to help package authors to fix their code accordingly,
but I'd be grateful to be notified *if* problems surface from this.

Martin Maechler, ETH Zurich

    > At a broader level, when I've designed vector/array classes, I've 
    > wondered what methods I should define, but have been unable to find any 
    > specification of a set of methods.  When one thinks about it, there are 
    > actually quite a set of strongly-connected methods with quite a lot a 
    > behaviors to implement, e.g., length, '[' (with logical, numeric & 
    > character indicies, including 0 and NA possibilities), '[[', 'c', and 
    > then optionally 'names', and then for multi-dim objects, 'dim', 
    > 'dimnames', etc.  Consequently, last time this discussion on length and 
    > '[' methods POSIXlt came up, I wrote a function that automatically 
    > tested behavior of all these methods on a specified class and summarizes 
    > the behavior.  If anyone is interested in such a thing, I'd be happy to 
    > dig it up and distribute it (I'd attach it to this message, but I'm on 
    > vacation and don't have access to the compute that I think it's on.)

    > -- Tony Plate

>> Martin Maechler, ETH Zurich (and R Core Team)
>>
>> ______________________________________________
>> R-devel_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>

    > ______________________________________________
    > R-devel_at_r-project.org mailing list     > https://stat.ethz.ch/mailman/listinfo/r-devel

R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 30 Nov 2009 - 13:13:30 GMT

This archive was generated by hypermail 2.2.0 : Mon 30 Nov 2009 - 14:10:53 GMT