Re: [Rd] Expected behaviour of is.unsorted?

From: Duncan Murdoch <murdoch.duncan_at_gmail.com>
Date: Thu, 24 May 2012 14:42:52 -0400

On 24/05/2012 1:33 PM, Matthew Dowle wrote:

> >  On 24/05/2012 11:10 AM, Matthew Dowle wrote:
> >>  >   On 24/05/2012 9:15 AM, Matthew Dowle wrote:

> >> >> Duncan Murdoch<murdoch.duncan<at> gmail.com> writes:
> >> >> >
> >> >> > On 12-05-24 7:39 AM, Matthew Dowle wrote:
> >> >> > > Duncan Murdoch<murdoch.duncan<at> gmail.com> writes:
> >> >> > >>
> >> >> > >> On 12-05-23 4:37 AM, Matthew Dowle wrote:
> >> >> > > Since it seems to have a bug anyway (and if so, can't be
> >>  correct

> >> >> in anyone's
> >> >> > > use of it), could either is.unsorted on a data.frame return
> >>  the

> >> >> error
> >> >> that's in
> >> >> > > the C code already: "only atomic vectors can be tested to be
> >> >> sorted", for
> >> >> > > safety and to lessen confusion, or be changed to return the
> >> >> natural
> >> >> expectation
> >> >> > > proposed above? The easiest quick fix would be to negate the
> >> >> result of
> >> >> the .gtn
> >> >> > > call of course, but then you could never go back.
> >> >> >
> >> >> > I don't follow the last sentence. If the .gtn call needs to be
> >> >> negated,
> >> >> > why would you want to go back?
> >>  >>

> >> >> Because then is.unsorted(DF) would work, but go by row, which you
> >> >> guessed above
> >> >> wasn't intended and isn't sensible. But once it worked in that way,
> >> >> users might
> >> >> start to depend on it; e.g., by writing is.unsorted(t(DF)). If I
> >>  came

> >> >> along in future and suggested that was inefficient and wouldn't it
> >>  be

> >> >> more
> >> >> natural and efficient if is.unsorted(DF) went by column, returning
> >>  the

> >> >> same as
> >> >> with(DF,is.unsorted(order(a,b))) but implemented efficiently, you
> >>  would

> >> >> fear
> >> >> that user code now depended on it going by row and say it was too
> >>  late.

> >> >> I'd
> >> >> persist and highlight that it didn't seem in keeping with the spirit
> >>  of

> >> >> is.unsorted()'s speed since it short circuits on the first unsorted
> >> >> item, which
> >> >> is why we love it. You'd reply that's not documented. Which it
> >>  isn't.

> >> >> And that
> >> >> would be the end of that.
> >>  >
> >>  >   Okay, I'm going to fix the handling of .gtn results, and document the
> >>  >   unsuitability of this
> >>  >   function for dataframes and arrays.
> >>
> >>  But that leaves the door open to confusion later, whilst closing the
> >>  door
> >>  to a better solution: making is.unsorted() work by column for
> >>  data.frame;
> >>  i.e., making is.unsorted _suitable_ for data.frame. If you just do the
> >>  quick fix for .gtn result you can never go back. If making
> >>  is.unsorted(DF)
> >>  work by column is too hard for now, then leaving the door open would be
> >>  better by returning the error message already in the C code: "only
> >>  atomic
> >>  vectors can be tested to be sorted". That would be a better quick fix
> >>  since it leaves options for the future.
> >
> >  I don't see why saying this function is unsuitable for dataframes
> >  implies that it will never be made suitable for dataframes.
>
> If user code or packages start to depend on is.unsorted(t(DF)) it would be
> harder to change, no?

I don't see why. t(DF) is not a dataframe, so it will give surprising answers in a different way. If people rely on using code in ways that are documented to give unexpected results, they deserve what they get.

>   Why provide something that is unsuitable and allow
> that possibility to happen? It's more user friendly to return "not
> implemented", "unsuitable", or the nicer message already in the C code,
> than leave the door open for confusion and errors. Or in other words, it's
> even more user friendly to return a warning or error to the user at the
> prompt, than the user friendliness of writing in the help file that it's
> unsuitable for data.frame.

I disagree. I think it is most friendly to implement the function in the way it has been documented (even if it hasn't always been behaving as documented).

Duncan Murdoch



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 24 May 2012 - 18:58:11 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 25 May 2012 - 04:31:45 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive