Re: [Rd] Suggestion: add a warning in the help-file of unique()

From: Ted Harding <Ted.Harding_at_manchester.ac.uk>
Date: Thu, 17 Apr 2008 14:54:22 +0100 (BST)


On 17-Apr-08 10:44:32, Matthieu Stigler wrote:
> Hello
>
> I'm sorry if this suggestion/correction was already made
> but after a search in devel list I did not find any mention
> of it. I would just suggest to add a warning or an exemple
> for the help-file of the function unique() like
>
> "Note that unique() compares only identical values. Values
> which, are printed equally but in facts are not identical
> will be treated as different."
>
>
> > a<-c(0.2, 0.3, 0.2, 0.4-0.1)
> > a
> [1] 0.2 0.3 0.2 0.3
> > unique(a)
> [1] 0.2 0.3 0.3
>
> Well this is just the idea and the sentence could be made better
> (my poor english...). Maybe a reference to RFAQ 7.31 could be made.
> Maybe is this behaviour clear and logical for experienced users,
> but I don't think it is for beginners. I personnaly spent two
> hours to see that the problem in my code came from this.

The above is potentially a useful suggestion, and I would be inclined to support it. However, for your other suggestion:

> I was thinking about modify the function unique() to introduce
> a "tol" argument which allows to compare with a tolerance level
> (with default value zero to keep unique consistent) like all.equal(),
> but it seemed too complicated with my little understanding.
>
> Bests regards and many thanks for what you do for R!
> Matthieu Stigler

What is really complicated about it is that the results may depend on the order of elements. When unique() eliminates only values which are strictly identical to values which have been scanned earlier, there is no problem.

But suppose you set "tol=0.11" in

unique(c(20.0, 30.0, 30.1, 30.2, 40.0)
# 20.0, 30.0, 40
[30.1 rejected because within 0.11 of previous 30.0;  30.2 rejected because within 0.11 of previous 30.1] and compare with

unique(c(20.0, 30.0, 30.2, 30.1, 40.0)
# 20.0, 30.0, 30.2, 40.0
[30.2 accepted because not within 0.11 of any previous;  30.1 rejected because within 0.11 of previous 30.2 or 30.0]

This kind of problem is always present in situations where there are potential "chained tolerances".

You cannot see the difference between the position of the hour-hand of a clock now, and one minute later.

But you may not chain this logic, for, if you could:

If A is indistinguishable from B, and B is indistinguishable   from C, then A is indistinguishable from C.

10:00 is indistinguishable from 10:01 (on the hour-hand) 10:[n] is indistinguishable from 10:[n+1]

Hence, by induction, 10:00 is indistinguishable from 11:00

Which you do not want!

Best wishes,
Ted.



E-Mail: (Ted Harding) <Ted.Harding_at_manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861
Date: 17-Apr-08                                       Time: 14:54:19
------------------------------ XFMail ------------------------------

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 17 Apr 2008 - 13:59:13 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 17 Apr 2008 - 16:31:12 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive