From: E Hofstadler <e.hofstadler_at_gmail.com>

Date: Fri, 01 Apr 2011 15:28:23 +0300

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 01 Apr 2011 - 12:38:32 GMT

Date: Fri, 01 Apr 2011 15:28:23 +0300

Thanks Nick and Juan for your replies.

Nick, thanks for pointing out the warning in subset(). I'm not sure though I understand the example you provided -- because despite using subset() rather than bracket notation, the original function (myfunct) does what is expected of it. The problem I have is with the second function (myfunct.better), where variable names + dataframe are not fixed within the function but passed to the function when calling it -- and even with bracket notation I don't quite manage to tell R where to look for the columns that related to the entered column names. (but then perhaps I misunderstood you)

This is what I tried (using bracket notation):

myfunct.better(dataframe, subgroup, lvarname,yvarname){ Data.tmp <- dataframe[dataframe[,deparse(substitute(lvarname))]==subgroup, c("xvar",deparse(substitute(yvarname)))] }

but this creates an empty contingency table only -- perhaps because my use of deparse() is flawed (I think what is converted into a string is "lvarname" and "yvarname", rather than the column names that these two function-variables represent in the dataframe)?

2011/4/1 Nick Sabbe <nick.sabbe_at_ugent.be>:

> See the warning in ?subset.

*> Passing the column name of lvar is not the same as passing the 'contextual
**> column' (as I coin it in these circumstances).
**> You can solve it by indeed using [] instead.
**>
**> For my own comfort, here is the relevant line from your original function:
**> Data.tmp <- subset(Fulldf, lvar==subgroup, select=c("xvar","yvar"))
**> Which should become something like (untested but should be close):
**> Data.tmp <- Fulldf[Fulldf[,"lvar"]==subgroup, c("xvar","yvar")]
**>
**> This should be a lot easier to translate based on column names, as the
**> column names are now used as such.
**>
**> HTH,
**>
**>
**> Nick Sabbe
**> --
**> ping: nick.sabbe_at_ugent.be
**> link: http://biomath.ugent.be
**> wink: A1.056, Coupure Links 653, 9000 Gent
**> ring: 09/264.59.36
**>
**> -- Do Not Disapprove
**>
**>
**>
**>
**> -----Original Message-----
**> From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org] On
**> Behalf Of E Hofstadler
**> Sent: vrijdag 1 april 2011 13:09
**> To: r-help_at_r-project.org
**> Subject: [R] programming: telling a function where to look for the entered
**> variables
**>
**> Hi there,
**>
**> Could someone help me with the following programming problem..?
**>
**> I have written a function that works for my intended purpose, but it
**> is quite closely tied to a particular dataframe and the names of the
**> variables in this dataframe. However, I'd like to use the same
**> function for different dataframes and variables. My problem is that
**> I'm not quite sure how to tell my function in which dataframe the
**> entered variables are located.
**>
**> Here's some reproducible data and the function:
**>
**> # create reproducible data
**> set.seed(124)
**> xvar <- sample(0:3, 1000, replace = T)
**> yvar <- sample(0:1, 1000, replace=T)
**> zvar <- rnorm(100)
**> lvar <- sample(0:1, 1000, replace=T)
**> Fulldf <- as.data.frame(cbind(xvar,yvar,zvar,lvar))
**> Fulldf$xvar <- factor(xvar, labels=c("blue","green","red","yellow"))
**> Fulldf$yvar <- factor(yvar, labels=c("area1","area2"))
**> Fulldf$lvar <- factor(lvar, labels=c("yes","no"))
**>
**> and here's the function in the form that it currently works: from a
**> subset of the dataframe Fulldf, a contingency table is created (in my
**> actual data, several other operations are then performed on that
**> contingency table, but these are not relevant for the problem in
**> question, therefore I've deleted it) .
**>
**> # function as it currently works: tailored to a particular dataframe
**> (Fulldf)
**>
**> myfunct <- function(subgroup){ # enter a particular subgroup for which
**> the contingency table should be calculated (i.e. a particular value of
**> the factor lvar)
**> Data.tmp <- subset(Fulldf, lvar==subgroup, select=c("xvar","yvar"))
**> #restrict dataframe to given subgroup and two columns of the original
**> dataframe
**> Data.tmp <- na.omit(Data.tmp) # exclude missing values
**> indextable <- table(Data.tmp$xvar, Data.tmp$yvar) # make contingency table
**> return(indextable)
**> }
**>
**> #Since I need to use the function with different dataframes and
**> variable names, I'd like to be able to tell my function the name of
**> the dataframe and variables it should use for calculating the index.
**> This is how I tried to modify the first part of the #function, but it
**> didn't work:
**>
**> # function as I would like it to work: independent of any particular
**> dataframe or variable names (doesn't work)
**>
**> myfunct.better <- function(subgroup, lvarname, yvarname, dataframe){
**> #enter the subgroup, the variable names to be used and the dataframe
**> in which they are found
**> Data.tmp <- subset(dataframe, lvarname==subgroup, select=c("xvar",
**> deparse(substitute(yvarname)))) # trying to subset the given dataframe
**> for the given subgroup of the given variable. The variable "xvar"
**> happens to have the same name in all dataframes) but the variable
**> yvarname has different names in the different dataframes
**> Data.tmp <- na.omit(Data.tmp)
**> indextable <- table(Data.tmp$xvar, Data.tmp$yvarname) # create the
**> contingency table on the basis of the entered variables
**> return(indextable)
**> }
**>
**> calling
**>
**> myfunct.better("yes", lvarname=lvar, yvarname=yvar, dataframe=Fulldf)
**>
**> results in the following error:
**>
**> Error in `[.data.frame`(x, r, vars, drop = drop) :
**> undefined columns selected
**>
**> My feeling is that R doesn't know where to look for the entered
**> variables (lvar, yvar), but I'm not sure how to solve this problem. I
**> tried using with() and even attach() within the function, but that
**> didn't work.
**>
**> Any help is greatly appreciated.
**>
**> Best,
**> Esther
**>
**> P.S.:
**> Are there books that elaborate programming in R for beginners -- and I
**> mean things like how to best use vectorization instead of loops and
**> general "best practice" tips for programming. Most of the books I've
**> been looking at focus on applying R for particular statistical
**> analyses, and only comparably briefly deal with more general
**> programming aspects. I was wondering if there's any books or tutorials
**> out there that cover the latter aspects in a more elaborate and
**> systematic way...?
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
**>
**>
*

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 01 Apr 2011 - 12:38:32 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Fri 01 Apr 2011 - 12:50:26 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*