Re: [R] programming: telling a function where to look for the entered variables

From: E Hofstadler <>
Date: Fri, 01 Apr 2011 15:28:23 +0300

Thanks Nick and Juan for your replies.

Nick, thanks for pointing out the warning in subset(). I'm not sure though I understand the example you provided -- because despite using subset() rather than bracket notation, the original function (myfunct) does what is expected of it. The problem I have is with the second function (myfunct.better), where variable names + dataframe are not fixed within the function but passed to the function when calling it -- and even with bracket notation I don't quite manage to tell R where to look for the columns that related to the entered column names. (but then perhaps I misunderstood you)

This is what I tried (using bracket notation):

myfunct.better(dataframe, subgroup, lvarname,yvarname){ Data.tmp <- dataframe[dataframe[,deparse(substitute(lvarname))]==subgroup, c("xvar",deparse(substitute(yvarname)))] }

but this creates an empty contingency table only -- perhaps because my use of deparse() is flawed (I think what is converted into a string is "lvarname" and "yvarname", rather than the column names that these two function-variables represent in the dataframe)?

2011/4/1 Nick Sabbe <>:
> See the warning in ?subset.
> Passing the column name of lvar is not the same as passing the 'contextual
> column' (as I coin it in these circumstances).
> You can solve it by indeed using [] instead.
> For my own comfort, here is the relevant line from your original function:
> Data.tmp <- subset(Fulldf, lvar==subgroup, select=c("xvar","yvar"))
> Which should become something like (untested but should be close):
> Data.tmp <- Fulldf[Fulldf[,"lvar"]==subgroup, c("xvar","yvar")]
> This should be a lot easier to translate based on column names, as the
> column names are now used as such.
> -----Original Message-----
> From: [] On
> Behalf Of E Hofstadler
> Sent: vrijdag 1 april 2011 13:09
> To:
> Subject: [R] programming: telling a function where to look for the entered
> variables
> Hi there,
> Could someone help me with the following programming problem..?
> I have written a function that works for my intended purpose, but it
> is quite closely tied to a particular dataframe and the names of the
> variables in this dataframe. However, I'd like to use the same
> function for different dataframes and variables. My problem is that
> I'm not quite sure how to tell my function in which dataframe the
> entered variables are located.
> Here's some reproducible data and the function:
> # create reproducible data
> set.seed(124)
> xvar <- sample(0:3, 1000, replace = T)
> yvar <- sample(0:1, 1000, replace=T)
> zvar <- rnorm(100)
> lvar <- sample(0:1, 1000, replace=T)
> Fulldf <-,yvar,zvar,lvar))
> Fulldf$xvar <- factor(xvar, labels=c("blue","green","red","yellow"))
> Fulldf$yvar <- factor(yvar, labels=c("area1","area2"))
> Fulldf$lvar <- factor(lvar, labels=c("yes","no"))
> and here's the function in the form that it currently works: from a
> subset of the dataframe Fulldf, a contingency table is created (in my
> actual data, several other operations are then performed on that
> contingency table, but these are not relevant for the problem in
> question, therefore I've deleted it) .
> # function as it currently works: tailored to a particular dataframe
> (Fulldf)
> myfunct <- function(subgroup){ # enter a particular subgroup for which
> the contingency table should be calculated (i.e. a particular value of
> the factor lvar)
> Data.tmp <- subset(Fulldf, lvar==subgroup, select=c("xvar","yvar"))
> #restrict dataframe to given subgroup and two columns of the original
> dataframe
> Data.tmp <- na.omit(Data.tmp) # exclude missing values
> indextable <- table(Data.tmp$xvar, Data.tmp$yvar) # make contingency table
> return(indextable)
> }
> #Since I need to use the function with different dataframes and
> variable names, I'd like to be able to tell my function the name of
> the dataframe and variables it should use for calculating the index.
> This is how I tried to modify the first part of the #function, but it
> didn't work:
> # function as I would like it to work: independent of any particular
> dataframe or variable names (doesn't work)
> myfunct.better <- function(subgroup, lvarname, yvarname, dataframe){
> #enter the subgroup, the variable names to be used and the dataframe
> in which they are found
>    Data.tmp <- subset(dataframe, lvarname==subgroup, select=c("xvar",
> deparse(substitute(yvarname)))) # trying to subset the given dataframe
> for the given subgroup of the given variable. The variable "xvar"
> happens to have the same name in all dataframes) but the variable
> yvarname has different names in the different dataframes
> Data.tmp <- na.omit(Data.tmp)
>    indextable <- table(Data.tmp$xvar, Data.tmp$yvarname) # create the
> contingency table on the basis of the entered variables
> return(indextable)
> }
> calling
> myfunct.better("yes", lvarname=lvar, yvarname=yvar, dataframe=Fulldf)
> results in the following error:
> Error in `[.data.frame`(x, r, vars, drop = drop) :
>  undefined columns selected
> My feeling is that R doesn't know where to look for the entered
> variables (lvar, yvar), but I'm not sure how to solve this problem. I
> tried using with() and even attach() within the function, but that
> didn't work.
> Any help is greatly appreciated.
> Best,
> Esther
> P.S.:
> Are there books that elaborate programming in R for beginners -- and I
> mean things like how to best use vectorization instead of loops and
> general "best practice" tips for programming. Most of the books I've
> been looking at focus on applying R for particular statistical
> analyses, and only comparably briefly deal with more general
> programming aspects. I was wondering if there's any books or tutorials
> out there that cover the latter aspects in a more elaborate and
> systematic way...?
