Re: [R] custom subset method / handling columns selection as logic in '...' parameter

From: Martin Morgan <mtmorgan_at_fhcrc.org>
Date: Thu, 20 Dec 2007 06:46:58 -0800

Eric --

Please don't cross post

Please simplify your example so that others do not have to work hard to understand what you are asking

See additional response on the Bioconductor mailing list.

Martin

"Eric Lecoutre" <ericlecoutre_at_gmail.com> writes:

> Dear R-helpers & bioconductor
>
>
> Sorry for cross-posting, this concerns R-programming stuff applied on
> Bioconductor context.
> Also sorry for this long message, I try to be complete in my request.
>
> I am trying to write a subset method for a specific class (ExpressionSet
> from Bioconductor) allowing selection more flexible than "[" method .
>
> The schema I am thinking for is the following:
>
> subset.ExpressionSet <- function(x,subset,...){
>
> }
>
> I will use the subset argument for rows (genes), as in default method.
>
> Now I would like to allow to select different columns (features) based on
> phenotypic data.
> phenotypic data provides detailed information about the columns.
>
> Basically, first function I have written allows the following:
>
>> sub1 <- subset(ExpressionSetObject, subset=NULL, V1=value1, v2=value2)
> # subset=NULL takes all rows
>
> See: there are two conditions on two variables belonging to the associated
> data.frame encapsulated in the ExpressionSetObject (to be complete, the
> conditions will be applied on more of 2 columns, as they are used on the
> phylogenic data.frame that concerns all variables)
> To simplify a little bit, this would nearly return:
> ExpressionSetObject[,V1==value & V2==value]
>
> This is nice as I can already handle any number of conditions on variables
> values thanks to '...'. First step is
> conditions <- list(...) and are then handled later in code
>
> Nevertheless, those conditions are basic (one value).
>
> I would like to handle arbitrary conditions, such as: V1 %in% c(value1,
> value2)
> More simple expression would be passed with V2==value instead of V2=value2
>
> My very problem is that I don't know how to turn '...' into an object
> containing those conditions that could be used later.
>
> My attempt which seems the nearest is:
>
>> foo <- function(...){
>> as.expression(substitute(list(...)))
>> }
>>foo(x==1,y%in%1:2)
> expression(list(x == 1, y %in% 1:2))
>
> where as I would like to have something like
> list(expression(x==1), expression(y %in% 1:2))
> those expressions beeing evaluated later on in the context of my specific
> object.
>
>
> Are there any existing function where '...' are already handled the way I
> want so that I can mimic?
>
> Thanks for any insight.
>
>
> Eric
>
> ---
>
> For those who have Biobase available, here is my current subset function and
> a demo-case that explains a little bit.
>
>
> library(Biobase)
> example(ExpressionSet) # create sample object
> print(expressionSet)
>
> # now my subset function as it is
>
> subset.ExpressionSet <- function(x,subset=NULL,verbose=TRUE,...){
> # subset is used to subset on rows
> # ... is used to make multiple conditions on columns based on pData
> # list of conditions is handled in ...
> stopifnot(is(x,"ExpressionSet"))
> phenoData <- pData(x)
> listCriteria <- list(...)
> if (is.null(subset)) subset <- rep(TRUE,nrow(exprs(x)))
> subset <- subset & !is.na(subset)
> retainedCriteria <- list()
> tmp <- sapply(names(listCriteria), function(critname) {
> if(!critname %in% colnames(phenoData)){
> if (verbose) cat("\n*** subsetCompounds: Dropped
> criteria:",critname, "not in phenoData of object\n")
> }else{
> if(is.null(listCriteria[critname])) listCriteria[[critname]]<-
> unique(phenoData[,critname])
> retainedCriteria[[critname]] <<- phenoData[,critname] %in%
> listCriteria[critname]
> }
> })
> criteriaValues <- do.call("cbind",retainedCriteria)
>
> selectedColumns <- rownames(phenoData)[apply(criteriaValues,1,logic)]
> ## cbind(phenoData,criteriaValues)
> out <- x[subset,selectedColumns]
> if (verbose) cat('\n',length(selectedColumns),' columns selected
> (',paste(selectedColumns,collapse=' '),
> ')\n',sep='')
> invisible(return(out))
> }
>
> # looking at phenotypic data associated with the sample expressionSet
>> pData(expressionSet)
> sex type score
> A Female Control 0.75
> B Male Case 0.40
> C Male Control 0.73
> D Male Case 0.42
> E Female Case 0.93
> F Male Control 0.22
> G Male Case 0.96
> H Male Case 0.79
> I Female Case 0.37
> J Male Control 0.63
> K Male Case 0.26
> L Female Control 0.36
> M Male Case 0.41
> N Male Case 0.80
> O Female Case 0.10
> P Female Control 0.41
> Q Female Case 0.16
> R Male Control 0.72
> S Male Case 0.17
> T Female Case 0.74
> U Male Control 0.35
> V Female Control 0.77
> W Male Control 0.27
> X Male Control 0.98
> Y Female Case 0.94
> Z Female Case 0.32
>
>
> # now the sample use
>> (subset1 =subset(expressionSet,sex="Male",type="Control"))
> 7 columns selected (C F J R U W X)
> ExpressionSet (storageMode: lockedEnvironment)
> assayData: 500 features, 7 samples
> element names: exprs, se.exprs
> phenoData
> sampleNames: C, F, ..., X (7 total)
> varLabels and varMetadata description:
> sex: Female/Male
> type: Case/Control
> score: Testing Score
> featureData
> featureNames: AFFX-MurIL2_at, AFFX-MurIL10_at, ..., 31739_at (500 total)
> fvarLabels and fvarMetadata description: none
> experimentData: use 'experimentData(object)'
> Annotation: hgu95av2
>
>
> # what I would like to allow in use:
> (subset2 = subset(expressionSet, sex=="Male", score > 0.75) # note the ==
> instead of =
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 20 Dec 2007 - 14:52:22 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 20 Dec 2007 - 15:30:21 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.