[R] programming: telling a function where to look for the entered variables

From: E Hofstadler <e.hofstadler_at_gmail.com>
Date: Fri, 01 Apr 2011 14:08:51 +0300


Hi there,

Could someone help me with the following programming problem..?

I have written a function that works for my intended purpose, but it is quite closely tied to a particular dataframe and the names of the variables in this dataframe. However, I'd like to use the same function for different dataframes and variables. My problem is that I'm not quite sure how to tell my function in which dataframe the entered variables are located.

Here's some reproducible data and the function:

# create reproducible data

set.seed(124)

xvar <- sample(0:3, 1000, replace = T)
yvar <- sample(0:1, 1000, replace=T)
zvar <- rnorm(100)
lvar <- sample(0:1, 1000, replace=T)

Fulldf <- as.data.frame(cbind(xvar,yvar,zvar,lvar))
Fulldf$xvar <- factor(xvar, labels=c("blue","green","red","yellow"))
Fulldf$yvar <- factor(yvar, labels=c("area1","area2"))
Fulldf$lvar <- factor(lvar, labels=c("yes","no"))

and here's the function in the form that it currently works: from a subset of the dataframe Fulldf, a contingency table is created (in my actual data, several other operations are then performed on that contingency table, but these are not relevant for the problem in question, therefore I've deleted it) .

# function as it currently works: tailored to a particular dataframe (Fulldf)

myfunct <- function(subgroup){ # enter a particular subgroup for which the contingency table should be calculated (i.e. a particular value of the factor lvar)
Data.tmp <- subset(Fulldf, lvar==subgroup, select=c("xvar","yvar"))
#restrict dataframe to given subgroup and two columns of the original
dataframe
Data.tmp <- na.omit(Data.tmp) # exclude missing values indextable <- table(Data.tmp$xvar, Data.tmp$yvar) # make contingency table return(indextable)
}

#Since I need to use the function with different dataframes and
variable names, I'd like to be able to tell my function the name of the dataframe and variables it should use for calculating the index. This is how I tried to modify the first part of the #function, but it didn't work:

# function as I would like it to work: independent of any particular
dataframe or variable names (doesn't work)

myfunct.better <- function(subgroup, lvarname, yvarname, dataframe){
#enter the subgroup, the variable names to be used and the dataframe
in which they are found

    Data.tmp <- subset(dataframe, lvarname==subgroup, select=c("xvar", deparse(substitute(yvarname)))) # trying to subset the given dataframe for the given subgroup of the given variable. The variable "xvar" happens to have the same name in all dataframes) but the variable yvarname has different names in the different dataframes Data.tmp <- na.omit(Data.tmp)

    indextable <- table(Data.tmp$xvar, Data.tmp$yvarname) # create the contingency table on the basis of the entered variables return(indextable)
}

calling

myfunct.better("yes", lvarname=lvar, yvarname=yvar, dataframe=Fulldf)

results in the following error:

Error in `[.data.frame`(x, r, vars, drop = drop) :   undefined columns selected

My feeling is that R doesn't know where to look for the entered variables (lvar, yvar), but I'm not sure how to solve this problem. I tried using with() and even attach() within the function, but that didn't work.

Any help is greatly appreciated.

Best,
Esther

P.S.:
Are there books that elaborate programming in R for beginners -- and I mean things like how to best use vectorization instead of loops and general "best practice" tips for programming. Most of the books I've been looking at focus on applying R for particular statistical analyses, and only comparably briefly deal with more general programming aspects. I was wondering if there's any books or tutorials out there that cover the latter aspects in a more elaborate and systematic way...?



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 01 Apr 2011 - 11:12:48 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 01 Apr 2011 - 11:50:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive