Re: [R] Function hints

From: Joerg van den Hoff <>
Date: Tue 20 Jun 2006 - 02:14:10 EST

hadley wickham wrote:
> One of the recurring themes in the recent UserR conference was that
> many people find it difficult to find the functions they need for a
> particular task. Sandy Weisberg suggested a small idea he would like
> to see: a hints function that given an object, lists likely
> operations. I've done my best to implement this function using the
> tools currently available in R, and my code is included at the bottom
> of this email (I hope that I haven't just duplicated something already
> present in R). I think Sandy's idea is genuinely useful, even in the
> limited form provided by my implementation, and I have already
> discovered a few useful functions that I was unaware of.
> While developing and testing this function, I ran into a few problems
> which, I think, represent underlying problems with the current
> documentation system. These are typified by the results of running
> hints on a object produced by glm (having class c("glm", "lm")). I
> have outlined (very tersely) some possible solutions. Please note
> that while these solutions are largely technological, the problem is
> at heart sociological: writing documentation is no easier (and perhaps
> much harder) than writing a scientific publication, but the rewards
> are fewer.
> Problems:
> * Many functions share the same description (eg. head, tail).
> Solution: each rdoc file should only describe one method. Problem:
> Writing rdoc files is tedious, there is a lot of information
> duplicated between the code and the documenation (eg. the usage
> statement) and some functions share a lot of similar information.
> Solution: make it easier to write documentation (eg. documentation
> inline with code), and easier to include certain common descriptions
> in multiple methods (eg. new include command)
> * It is difficult to tell which functions are commonly
> used/important. Solution: break down by keywords. Problem: keywords
> are not useful at the moment. Solution: make better list of keywords
> available and encourage people to use it. Problem: people won't
> unless there is a strong incentive, plus good keywording requires
> considerable expertise (especially in bulding up list). This is
> probably insoluable unless one person systematically keywords all of
> the base packages.
> * Some functions aren't documented (eg. simulate.lm, formula.glm) -
> typically, these are methods where the documentation is in the
> generic. Solution: these methods should all be aliased to the generic
> (by default?), and R CMD check should be amended to check for this
> situation. You could also argue that this is a deficiency with my
> function, and easily fixed by automatically referring to the generic
> if the specific isn't documented.
> * It can't supply suggestions when there isn't an explicit method
> (ie. .default is used), this makes it pretty useless for basic
> vectors. This may not really be a problem, as all possible operations
> are probably too numerous to list.
> * Provides full name for function, when best practice is to use
> generic part only when calling function. However, getting precise
> documentation may requires that full name. I do the best I can
> (returning the generic if specific is alias to a documentation file
> with the same method name), but this reflects a deeper problem that
> the name you should use when calling a function may be different to
> the name you use to get documentation.
> * Can only display methods from currently loaded packages. This is a
> shortcoming of the methods function, but I suspect it is difficult to
> find S3 methods without loading a package.
> Relatively trivial problems:
> * Needs wide display to be effective. Could be dealt with by
> breaking description in a sensible manner (there may already by R code
> to do this. Please let me know if you know of any)
> * Doesn't currently include S4 methods. Solution: add some more code
> to wrap showMethods
> * Personally, I think sentence case is more aesthetically pleasing
> (and more flexible) than title case.
> Hadley
> hints <- function(x) {
> db <- eval(utils:::.hsearch_db())
> if (is.null(db)) {
>"abcd!", rebuild=TRUE, agrep=FALSE)
> db <- eval(utils:::.hsearch_db())
> }
> base <- db$Base
> alias <- db$Aliases
> key <- db$Keywords
> m <- all.methods(class=class(x))
> m_id <- alias[match(m, alias[,1]), 2]
> keywords <- lapply(m_id, function(id) key[key[,2] %in% id, 1])
> f.names <- cbind(m, base[match(m_id, base[,3]), 4])
> f.names <- unlist(lapply(1:nrow(f.names), function(i) {
> if ([i, 2])) return(f.names[i, 1])
> a <- methodsplit(f.names[i, 1])
> b <- methodsplit(f.names[i, 2])
> if (a[1] == b[1]) f.names[i, 2] else f.names[i, 1]
> }))
> hints <- cbind(f.names, base[match(m_id, base[,3]), 5])
> hints <- hints[order(tolower(hints[,1])),]
> hints <- rbind( c("--------", "---------------"), hints)
> rownames(hints) <- rep("", nrow(hints))
> colnames(hints) <- c("Function", "Task")
> hints[] <- "(Unknown)"
> class(hints) <- "hints"
> hints
> }
> print.hints <- function(x, ...) print(unclass(x), quote=FALSE)
> all.methods <- function(classes) {
> methods <-,lapply(classes, function(x) {
> m <- methods(class=x)
> t(sapply(as.vector(m), methodsplit)) #m[attr(m, "info")$visible]
> }))
> rownames(methods[!duplicated(methods[,1]),])
> }
> methodsplit <- function(m) {
> parts <- strsplit(m, "\\.")[[1]]
> if (length(parts) == 1) {
> c(name=m, class="")
> } else{
> c(name=paste(parts[-length(parts)], collapse="."), class=parts[length(parts)])
> }
> }
> ______________________________________________
> mailing list
> PLEASE do read the posting guide!

just a feedback: that's a useful function, thank you.

but the problem is probably more general: frequently I do not really want to know what I generally can do with a data frame, for instance, but rather I would like to use `' as I would use, say, Google (and with the same rate of success...).
but the actual `keywords' in the manpages seem insufficient and `' does not allow full text search in the manpages (I can imagine why (1000 hits...), but without such a thing google, for instance, would probably not be half as useful as it is, right?) and there is no "sorting by relevance" in the `' output, I think.
how this sorting could be achieved is a different question, of course. mailing list PLEASE do read the posting guide! Received on Tue Jun 20 02:24:19 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 20 Jun 2006 - 04:11:43 EST.

Mailing list information is available at Please read the posting guide before posting to the list.