Re: [Rd] Expected behaviour of is.unsorted?

From: Duncan Murdoch <murdoch.duncan_at_gmail.com>
Date: Thu, 24 May 2012 11:25:33 -0400

On 24/05/2012 11:10 AM, Matthew Dowle wrote:

> >  On 24/05/2012 9:15 AM, Matthew Dowle wrote:

> >> Duncan Murdoch<murdoch.duncan<at> gmail.com> writes:
> >> >
> >> > On 12-05-24 7:39 AM, Matthew Dowle wrote:
> >> > > Duncan Murdoch<murdoch.duncan<at> gmail.com> writes:
> >> > >>
> >> > >> On 12-05-23 4:37 AM, Matthew Dowle wrote:
> >> > > Since it seems to have a bug anyway (and if so, can't be correct
> >> in anyone's
> >> > > use of it), could either is.unsorted on a data.frame return the
> >> error
> >> that's in
> >> > > the C code already: "only atomic vectors can be tested to be
> >> sorted", for
> >> > > safety and to lessen confusion, or be changed to return the
> >> natural
> >> expectation
> >> > > proposed above? The easiest quick fix would be to negate the
> >> result of
> >> the .gtn
> >> > > call of course, but then you could never go back.
> >> >
> >> > I don't follow the last sentence. If the .gtn call needs to be
> >> negated,
> >> > why would you want to go back?
> >>

> >> Because then is.unsorted(DF) would work, but go by row, which you
> >> guessed above
> >> wasn't intended and isn't sensible. But once it worked in that way,
> >> users might
> >> start to depend on it; e.g., by writing is.unsorted(t(DF)). If I came
> >> along in future and suggested that was inefficient and wouldn't it be
> >> more
> >> natural and efficient if is.unsorted(DF) went by column, returning the
> >> same as
> >> with(DF,is.unsorted(order(a,b))) but implemented efficiently, you would
> >> fear
> >> that user code now depended on it going by row and say it was too late.
> >> I'd
> >> persist and highlight that it didn't seem in keeping with the spirit of
> >> is.unsorted()'s speed since it short circuits on the first unsorted
> >> item, which
> >> is why we love it. You'd reply that's not documented. Which it isn't.
> >> And that
> >> would be the end of that.
> >
> >  Okay, I'm going to fix the handling of .gtn results, and document the
> >  unsuitability of this
> >  function for dataframes and arrays.
>
> But that leaves the door open to confusion later, whilst closing the door
> to a better solution: making is.unsorted() work by column for data.frame;
> i.e., making is.unsorted _suitable_ for data.frame. If you just do the
> quick fix for .gtn result you can never go back. If making is.unsorted(DF)
> work by column is too hard for now, then leaving the door open would be
> better by returning the error message already in the C code: "only atomic
> vectors can be tested to be sorted". That would be a better quick fix
> since it leaves options for the future.

I don't see why saying this function is unsuitable for dataframes implies that it will never be made suitable for dataframes.

The fix handles the case is.unsorted was designed for: it checks whether x[1] < x[2] < x[3] etc., which it doesn't currently do properly for non-atomic objects.

Duncan Murdoch

>
> >  Duncan Murdoch
> >

> >>
> >> > Duncan Murdoch
> >> >
> >> > >
> >> > > Matthew
> >> > >
> >> > >> Duncan Murdoch
> >> > >>
> >> > >>>
> >> > >>> I understand why the first two are FALSE (1 item of anything
> >> must be
> >> > >>> sorted). I don't understand the 3rd and 4th cases where length
> >> is 2:
> >> > >>> do_isunsorted seems to call lang3(install(".gtn"), x,
> >> CADR(args))). Does
> >> > >>> that fall back to TRUE for some reason?
> >> > >>>
> >> > >>> Matthew
> >> > >>>
> >> > >>>> sessionInfo()
> >> > >>> R version 2.15.0 (2012-03-30)
> >> > >>> Platform: x86_64-pc-mingw32/x64 (64-bit)
> >> > >>>
> >> > >>> locale:
> >> > >>> [1] LC_COLLATE=English_United Kingdom.1252
> >> LC_CTYPE=English_United
> >> > >>> Kingdom.1252
> >> > >>> [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
> >> > >>> [5] LC_TIME=English_United Kingdom.1252
> >> > >>>
> >> > >>> attached base packages:
> >> > >>> [1] stats graphics grDevices utils datasets methods
> >> base
> >> > >>>
> >> > >>> other attached packages:
> >> > >>> [1] data.table_1.8.0
> >> > >>>
> >> > >>> loaded via a namespace (and not attached):
> >> > >>> [1] tools_2.15.0
> >> > >>>
> >> > >>> ______________________________________________
> >> > >>> R-devel<at> r-project.org mailing list
> >> > >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >> > >>
> >> > >>
> >> > >
> >> > > ______________________________________________
> >> > > R-devel<at> r-project.org mailing list
> >> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >> >
> >> >
> >>
> >> ______________________________________________
> >> R-devel_at_r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
>
>

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 24 May 2012 - 15:27:17 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 24 May 2012 - 18:21:49 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive