Re: [Rd] Expected behaviour of is.unsorted?

From: Duncan Murdoch <murdoch.duncan_at_gmail.com>
Date: Thu, 24 May 2012 08:20:23 -0400

On 12-05-24 7:39 AM, Matthew Dowle wrote:

> Duncan Murdoch<murdoch.duncan<at>  gmail.com>  writes:

>>
>> On 12-05-23 4:37 AM, Matthew Dowle wrote:
>>>
>>> Hi,
>>>
>>> I've read ?is.unsorted and searched. Have found a few items but nothing
>>> close, yet. Is the following expected?
>>>
>>>> is.unsorted(data.frame(1:2))
>>> [1] FALSE
>>>> is.unsorted(data.frame(2:1))
>>> [1] FALSE
>>>> is.unsorted(data.frame(1:2,3:4))
>>> [1] TRUE
>>>> is.unsorted(data.frame(2:1,4:3))
>>> [1] TRUE
>>>
>>> IIUC, is.unsorted is intended for atomic vectors only (description of x in
>>> ?is.unsorted). Indeed the C source (src/main/sort.c) contains an error
>>> message "only atomic vectors can be tested to be sorted". So that is the
>>> error message I expected to see in all cases above, since I know that
>>> data.frame is not an atomic vector. But there is also this in
>>> ?is.unsorted: "except for atomic vectors and objects with a class (where
>>> the>= or> method is used)" which I don't understand. Where>= or> is
>>> used by what, and where?
>>
>> If you look at the source, you will see that the basic test for classed
>> objects is
>>
>> all(x[-1L]>= x[-length(x)])
>>
>> (in the function base:::.gtn).
>>
>> This comparison doesn't really makes sense for dataframes, but it does
>> seem to be backwards: that tests that x[2]>= x[1], x[3]>= x[2], etc.,
>> returning TRUE if all comparisons are TRUE: but that sounds like it
>> should be is.sorted(), not is.unsorted(). Or is it my brain that is
>> backwards?
>
> Thanks. Yes you're right. So is.unsorted() on a data.frame is trying to tell us
> if there exists any unsorted row, it seems.

I would guess that it was never intended to be used this way. It is intended for to test x[1] < x[2] < x[3] ... for objects where this is a sensible calculation; it isn't really sensible for dataframes.

>

>> DF = data.frame(a=c(1,3,5),b=c(1,3,5))
>> DF
>    a b
> 1 1 1               # this row is sorted
> 2 3 3               # this row is sorted
> 3 5 5               # this row is sorted

>> is.unsorted(DF) # going by row but should be !.gtn
> [1] TRUE

>> with(DF,is.unsorted(order(a,b))) # most people's natural expectation I guess
> [1] FALSE

>> DF[2,2]=2
>> DF
>    a b
> 1 1 1               # this row is sorted
> 2 3 2               # this row isn't sorted
> 3 5 5               # this row is sorted

>> is.unsorted(DF) # going by row but should be !.gtn
> [1] FALSE

>> with(DF,is.unsorted(order(a,b))) # most people's natural expectation I guess
> [1] FALSE
>
> Since it seems to have a bug anyway (and if so, can't be correct in anyone's
> use of it), could either is.unsorted on a data.frame return the error that's in
> the C code already: "only atomic vectors can be tested to be sorted", for
> safety and to lessen confusion, or be changed to return the natural expectation
> proposed above? The easiest quick fix would be to negate the result of the .gtn
> call of course, but then you could never go back.

I don't follow the last sentence. If the .gtn call needs to be negated, why would you want to go back?

Duncan Murdoch

>
> Matthew
>

>> Duncan Murdoch
>>
>>>
>>> I understand why the first two are FALSE (1 item of anything must be
>>> sorted). I don't understand the 3rd and 4th cases where length is 2:
>>> do_isunsorted seems to call lang3(install(".gtn"), x, CADR(args))). Does
>>> that fall back to TRUE for some reason?
>>>
>>> Matthew
>>>
>>>> sessionInfo()
>>> R version 2.15.0 (2012-03-30)
>>> Platform: x86_64-pc-mingw32/x64 (64-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United
>>> Kingdom.1252
>>> [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
>>> [5] LC_TIME=English_United Kingdom.1252
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] data.table_1.8.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] tools_2.15.0
>>>
>>> ______________________________________________
>>> R-devel<at> r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 24 May 2012 - 12:32:33 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 24 May 2012 - 14:01:52 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive