Re: [Rd] Increase transparency: suggestion on how to avoid namespaces and/or unnecessary overwrites of existing functions

From: Simon Urbanek <simon.urbanek_at_r-project.org>
Date: Sat, 01 Oct 2011 19:45:18 -0400

On Oct 1, 2011, at 6:14 PM, Duncan Murdoch wrote:

> On 11-10-01 5:14 PM, Dominick Samperi wrote:
>> On Sat, Oct 1, 2011 at 1:08 PM, Duncan Murdoch<murdoch.duncan_at_gmail.com>  wrote:
>>> On 11-08-23 2:23 PM, Janko Thyson wrote:

>>>>
>>>> aDear list,
>>>>
>>>> I'm aware of the fact that I posted on something related a while ago,
>>>> but I just can't sweat this off and would like to ask your for an opinion:
>>>>
>>>> The problem:
>>>> Namespaces are great, but they don't resolve certain conflicts regarding
>>>> name clashes. There are more and more people out there trying to come up
>>>> with their own R packages, which is great also! Yet, it becomes more and
>>>> more likely that programmers will choose identical names for their
>>>> exported functions and/or that they add functionality to existing
>>>> function (i.e. overwriting existing functions).
>>>> The whole process of which packages overwrite which functions is
>>>> somewhat obscure and in addition depends on their order in the search
>>>> path. On the other hand, it is not possible to use "namespace"
>>>> functionality (i.e. 'namespace::fun()'; also less efficient than direct
>>>> call; see illustration below) during early stages of the development
>>>> process (i.e. the package is not finished yet) as there is no namespace
>>>> available yet.
>>>>
>>> 
>>> I agree there can be a problem, but I don't think it is necessarily as
>>> serious as you suggest.  Even though there are more and more packages
>>> available, most people will still use roughly the same number of them. Just
>>> because CRAN has thousands of packages doesn't mean I use all of them at the
>>> same time.
>>> 
>>> 

>>>> I know of at least two cases where such overwrites (I think it's called
>>>> masking, right?) led to some confusion at our chair:
>>>> 1) loading package forecast overwrites certain functions in stats which
>>>> made some code refactoring necessary
>>> 
>>> If your code had been in a package with a NAMESPACE, it would not have been
>>> affected by a user loading forecast.  (If you start importing it, then of
>>> course it could cause masking problems.)
>>> 
>>> You suggest above that users only put code into a package very late in the
>>> development process.  The solution is, don't do that.  Create a package
>>> early on, and use it through the majority of development time.
>>> 
>>> You can leave the choice of exports until late by exporting everything;
>>> you'll still get the benefit of the more controlled name search from the
>>> beginning.
>>> 
>>> You say you can't use "namespace::call()" until the namespace package has
>>> been written.  But why would you want to?  If the call is coming from the
>>> new package, objects in it will be used with first priority in resolving the
>>> call.  You only need the :: notation when there are ambiguities in calls to
>>> external packages.
>>> 
>>> 

>>>> 2) loading package 'R.utils' followed by package 'roxygen' overwrites
>>>> 'parse.default()' which results in errors for something like
>>>> 'eval(parse(text="a<- 1"))' ; see illustration below)
>>>> And I'm sure the community could come up with lots more of such scenarios.
>>>>
>>>> Suggestions:
>>>> 1) In order to avoid name clashes/unintended overwrites, how about
>>>> switching to a coding paradigm that explicitly (and automatically)
>>>> includes a package's name in all its functions' names once code is
>>>> turned into a real package? E.g., getting used to "preemptively" type
>>>> 'package_fun()' or 'package.fun()' instead of just 'fun()'. Better to be
>>>> save than sorry, right? This could be realized pretty easily (see
>>>> example below) and, IMHO, would significantly increase transparency.
>>> 
>>> I think long names with consistent prefixes are harder to read than short
>>> descriptive names.  I think this would make code harder to read. For
>>> example, the first few lines of mean.default would change from
>>> 
>>>    if (!is.numeric(x)&&  !is.complex(x)&&  !is.logical(x)) {
>>>        warning("argument is not numeric or logical: returning NA")
>>>        return(NA_real_)
>>>    }
>>> 
>>> to
>>> 
>>>    if (!base_is.numeric(x)&&  !base_is.complex(x)&&
>>>        !base_is.logical(x)) {
>>>        base_warning("argument is not numeric or logical: returning NA")
>>>        return(base_NA_real_)
>>>    }
>>> 
>>> 

>>>> 2) In order to avoid intended (but for the user often pretty obscure)
>>>> overwrites of existing functions, we could use the same mechanism
>>>> together with the "rule": just don't provide any functions that
>>>> overwrite existing ones, rather prepend your version of that function
>>>> with your package name and leave it up to the user which version he
>>>> wants to call.
>>> 
>>> That seems like good advice.
>>> 
>>> Duncan Murdoch
>> 
>> Except that namespace::foo should be assigned to another local
>> variable instead of using package::foo in a tight loop, because
>> repeated calls to "::" can introduce a significant performance
>> penalty. (This has been discussed in another thread.)
> 
> That's good advice too.
> 

Well, if this idiom is used more often, we may re-consider the implementation of `::` - it could definitely be sped up if needed, but I think so far the view was that it's not worth the hassle (aka increased complexity and relative ugliness of making it a special form) ...

Cheers,
Simon

> Duncan Murdoch
> 
>> 

>>>>
>>>> At the moment, all of this is probably not that big of a deal yet, but
>>>> my suggestion has more of a mid-term/long-term character.
>>>>
>>>> Below you find a little illustration. I'm probably asking too much, but
>>>> it'd be great if we could get a little discussion going on how to
>>>> improve the way of loading packages!
>>>>
>>>> Best regards and thanks for R and all it's packages!
>>>> Janko
>>>>
>>>>
>>>> ################################################################################
>>>> # PROOF OF CONCEPT
>>>>
>>>> ################################################################################
>>>>
>>>> # 1) PROBLEM
>>>> # IMHO, with the number of packages submitted to CRAN constantly
>>>> increasing,
>>>> # over time we will be likely to see problems with respect to name
>>>> clashes.
>>>> # The main reasons I see for this are the following:
>>>> # a) package developers picking identical names for their exported
>>>> functions
>>>> # b) package developers overwriting base functions in order to add
>>>> functionality
>>>> # to existing functions
>>>> # c) ...
>>>> #
>>>> # This can create scenarios in which the user might not exactly know that
>>>> # he/she is using a 'modified' version of a specific function. More so,
>>>> the user
>>>> # needs to carefully read the description of each new package he plans
>>>> # to use in order to find out which functions are exported and which
>>>> existing
>>>> # functions might be overwritten. This in turn might imply that the user's
>>>> # existing code needs to be refactored (i.e. instead of using 'fun()' it
>>>> # might now be necessary to type 'namespace::fun()' to be sure that the
>>>> desired
>>>> # function is called).
>>>>
>>>> # 2) SUGGESTED SOLUTION
>>>> # That being said, why don't we switch to a 'preemptive' coding paradigm
>>>> # where the default way of calling functions includes the specification of
>>>> # its namespace? In principle, the functionality offered by
>>>> 'namespace::fun()'
>>>> # gets the job done.
>>>> # BUT:
>>>> # a) it is slower compared to the direct way of calling a function.
>>>> # (see illustration below).
>>>> # b) this option is not available througout the development process of a
>>>> package
>>>> # as there is no namespace yet and there's no way to emulate one.
>>>> This in
>>>> # turn means that even though a package developer would buy into
>>>> strictly
>>>> # using 'mypkg::fun()' throughout his package code, he can only do so
>>>> at the
>>>> # very final stage of the process RIGHT before turning his code into a
>>>> # working package (when he's absolutely sure everything is working as
>>>> planned).
>>>> # For debugging he would need to go back to using 'fun()'. Pretty
>>>> cumbersome.
>>>>
>>>> # So how about simply automatically prepending a given function's name
>>>> with
>>>> # the package's name for each package that is build (e.g. 'pkg.fun()' or
>>>> # 'pkg_fun()')? In the end, this would just be a small change for new
>>>> packages
>>>> # without a significant decrease of performance and it could also be
>>>> realized
>>>> # at early stages of the development process (see illustration below).
>>>>
>>>> # 3) ILLUSTRATION
>>>>
>>>> # Example case where base function 'parse.default' is overwritten:
>>>> parse(text="a<- 5") # Works
>>>> require(R.utils)
>>>> require(roxygen)
>>>> parse(text="a<- 5") # Does not work anymore
>>>>
>>>> ################# START A NEW R SESSION BEFORE YOU CONTINUE
>>>> ####################
>>>>
>>>> # Inefficiency of 'namespace::fun()':
>>>> require(microbenchmark)
>>>> res.a<- microbenchmark(eval(parse(text="a<- 5")))
>>>> res.b<- microbenchmark(eval(base::parse(text="a<- 5")))
>>>> median(res.a$time)/median(res.b$time)
>>>>
>>>> # Can be made up by explicit assignment:
>>>> foo<- base::parse
>>>> res.a<- microbenchmark(eval(parse(text="a<- 5")))
>>>> res.b<- microbenchmark(eval(foo(text="a<- 5")))
>>>> median(res.a$time)/median(res.b$time)
>>>>
>>>> # Automatically prepend function names:
>>>> processNamespaces<- function(
>>>> do.global=FALSE,
>>>> do.verbose=FALSE,
>>>> .delim.name="_",
>>>> ...
>>>> ){
>>>> srch.list.0<- search()
>>>> srch.list<- gsub("package:", "", srch.list.0)
>>>> if(!do.global){
>>>> assign(".NS", new.env(), envir=.GlobalEnv)
>>>> }
>>>> out<- lapply(1:length(srch.list), function(x.pkg){
>>>> pkg<- srch.list[x.pkg]
>>>>
>>>> # SKIP LIST
>>>> if(pkg %in% c(".GlobalEnv", "Autoloads")){
>>>> return(NULL)
>>>> }
>>>> # /
>>>>
>>>> # TARGET ENVIR
>>>> if(!do.global){
>>>> # ADD PACKAGE TO .NS ENVIRONMENT
>>>> envir<- eval(substitute(
>>>> assign(PKG, new.env(), envir=.NS),
>>>> list(PKG=pkg)
>>>> ))
>>>> # /
>>>> # envir<- get(pkg, envir=.NS, inherits=FALSE)
>>>> envir.msg<- paste(".NS$", pkg, sep="")
>>>> } else {
>>>> envir<- .GlobalEnv
>>>> envir.msg<- ".GlobalEnv"
>>>> }
>>>> # /
>>>>
>>>> # PROCESS FUNCTIONS
>>>> cnt<- ls(pos=x.pkg)
>>>> out<- unlist(sapply(cnt, function(x.cnt){
>>>> value<- get(x.cnt, pos=x.pkg, inherits=FALSE)
>>>> obj.mod<- paste(pkg, x.cnt, sep=.delim.name)
>>>> if(!is.function(value)){
>>>> return(NULL)
>>>> }
>>>> if(do.verbose){
>>>> cat(paste("Assigning '", obj.mod, "' to '", envir.msg,
>>>> "'", sep=""), sep="\n")
>>>> }
>>>> eval(substitute(
>>>> assign(OBJ.MOD, value, envir=ENVIR),
>>>> list(
>>>> OBJ.MOD=obj.mod,
>>>> ENVIR=envir
>>>> )
>>>> ))
>>>> return(obj.mod)
>>>> }))
>>>> names(out)<- NULL
>>>> # /
>>>> return(out)
>>>> })
>>>> names(out)<- srch.list
>>>> return(out)
>>>> }
>>>>
>>>> # +++++
>>>>
>>>> funs<- processNamespaces(do.verbose=TRUE)
>>>> ls(.NS)
>>>> ls(.NS$base)
>>>> .NS$base$base_parse
>>>>
>>>> res.a<- microbenchmark(eval(parse(text="a<- 5")))
>>>> res.b<- microbenchmark(eval(.NS$base$base_parse(text="a<- 5")))
>>>> median(res.a$time)/median(res.b$time)
>>>>
>>>> #+++++
>>>>
>>>> funs<- processNamespaces(do.global=TRUE, do.verbose=TRUE)
>>>> base_parse
>>>>
>>>> res.a<- microbenchmark(eval(parse(text="a<- 5")))
>>>> res.b<- microbenchmark(eval(base_parse(text="a<- 5")))
>>>> median(res.a$time)/median(res.b$time)
>>>>
>>>> ______________________________________________
>>>> R-devel_at_r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>>> ______________________________________________
>>> R-devel_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
> 
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun 02 Oct 2011 - 00:24:56 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 03 Oct 2011 - 13:00:38 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive