Re: [R] Variable passed to function not used in function in select=... in subset

From: hadley wickham <h.wickham_at_gmail.com>
Date: Tue, 11 Nov 2008 09:41:12 -0600

> I think your analysis is correct, that the goals of casual use and
> programming are inconsistent. But in general I think there's always going
> to be support for providing alternative ways that are programmer-safe.
>
> For instance, library( foo, character.only=TRUE) says that foo is a
> character vector, not the name of a package. I don't know of anything that
> subset() provides that is not available in other ways (I think of it as
> purely a convenience function, and my first piece of advice to Karl was not
> to use it).

Good points - every function optimised for interactive use should have a companion that is optimised for programmatic use.

> However, if there really is something there, then it would be
> worthwhile pointing that out, and either modifying subset() to make it safe,
> or providing an alternative function.

When I teach subsetting I try to make this clear - using [ will always work, there's no magic and everything is explicit. subset() has more magic which saves you typing, but occasionally the magic doesn't work and you'll be left scratching your head as to why. In my experience students prefer subset() until they encounter strange behaviour that they don't understand.

> I think this tension is a fundamental part of the character of S and R. But
> it is also fundamental to R that there are QC tests that apply to code in
> packages: so writing new tests that detect dangerous usage (e.g. to
> disallow partial name matching) would be another way to improve reliability.
> Writing a test for misuse of drop=TRUE seems quite hard, but there are
> probably ways a debugger could do it: e.g. to tag the invocation as to
> whether any indices were dropped on the first call, and then warn if the
> result isn't the same on every subsequent call).

A similar thing would be to force package authors to explicitly specify na.rm to ensure that they have thought about how to deal with missing values (this always trips me up). Perhaps you could treat drop similarly - in non-interactive code drop should not have a default value. Presumably this wouldn't be too hard to implement - R CMD check would just switch out [ for a version that didn't have a default value, in a similar way to what happens with T and F (another example of implicit interactive use vs. explicit programmatic use)

> Conceivably Karl's problem could be detected in the same way: tag each name
> in the expression as to whether it was found in the data frame or some other
> environment, and then warn if that tag ever changes. Or maybe the test
> should just warn that subset() is a convenience function, not meant for
> programming.

It would be nice if the documentation was clearer on these issues. I can imagine every function having a numeric value associated with it which gave it's position on the interactive vs programming continuum. Then you could sum up all the values in a function and warn the author if it was too high. Not very practical to implement though!

Hadley

-- 
http://had.co.nz/

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 11 Nov 2008 - 15:43:26 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 11 Nov 2008 - 16:30:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive