[R] custom subset method / handling columns selection as logic in '...' parameter

From: Eric Lecoutre <ericlecoutre_at_gmail.com>
Date: Thu, 20 Dec 2007 15:15:39 +0100


Dear R-helpers & bioconductor

Sorry for cross-posting, this concerns R-programming stuff applied on Bioconductor context.
Also sorry for this long message, I try to be complete in my request.

I am trying to write a subset method for a specific class (ExpressionSet from Bioconductor) allowing selection more flexible than "[" method .

The schema I am thinking for is the following:

subset.ExpressionSet <- function(x,subset,...){

}

I will use the subset argument for rows (genes), as in default method.

Now I would like to allow to select different columns (features) based on phenotypic data.
phenotypic data provides detailed information about the columns.

Basically, first function I have written allows the following:

> sub1 <- subset(ExpressionSetObject, subset=NULL, V1=value1, v2=value2)
# subset=NULL takes all rows

See: there are two conditions on two variables belonging to the associated data.frame encapsulated in the ExpressionSetObject (to be complete, the conditions will be applied on more of 2 columns, as they are used on the phylogenic data.frame that concerns all variables) To simplify a little bit, this would nearly return: ExpressionSetObject[,V1==value & V2==value]

This is nice as I can already handle any number of conditions on variables values thanks to '...'. First step is
conditions <- list(...) and are then handled later in code

Nevertheless, those conditions are basic (one value).

I would like to handle arbitrary conditions, such as: V1 %in% c(value1, value2)
More simple expression would be passed with V2==value instead of V2=value2

My very problem is that I don't know how to turn '...' into an object containing those conditions that could be used later.

My attempt which seems the nearest is:

> foo <- function(...){
> as.expression(substitute(list(...)))
> }
>foo(x==1,y%in%1:2)

expression(list(x == 1, y %in% 1:2))

where as I would like to have something like list(expression(x==1), expression(y %in% 1:2)) those expressions beeing evaluated later on in the context of my specific object.

Are there any existing function where '...' are already handled the way I want so that I can mimic?

Thanks for any insight.

Eric

---

For those who have Biobase available, here is my current subset function and
a demo-case that explains a little bit.


library(Biobase)
example(ExpressionSet) # create sample object
print(expressionSet)

# now my subset function as it is

subset.ExpressionSet <- function(x,subset=NULL,verbose=TRUE,...){
  # subset is used to subset on rows
  # ... is used to make multiple conditions on columns based on pData
  # list of conditions is handled in ...
    stopifnot(is(x,"ExpressionSet"))
    phenoData <- pData(x)
    listCriteria <- list(...)
    if (is.null(subset)) subset <- rep(TRUE,nrow(exprs(x)))
    subset <- subset & !is.na(subset)
    retainedCriteria <- list()
    tmp <- sapply(names(listCriteria), function(critname) {
      if(!critname %in% colnames(phenoData)){
        if (verbose) cat("\n*** subsetCompounds: Dropped
criteria:",critname, "not in phenoData of object\n")
      }else{
        if(is.null(listCriteria[critname])) listCriteria[[critname]]<-
unique(phenoData[,critname])
        retainedCriteria[[critname]] <<-  phenoData[,critname] %in%
listCriteria[critname]
      }
      })
      criteriaValues <- do.call("cbind",retainedCriteria)

     selectedColumns <- rownames(phenoData)[apply(criteriaValues,1,logic)]
      ## cbind(phenoData,criteriaValues)
      out <- x[subset,selectedColumns]
    if (verbose)  cat('\n',length(selectedColumns),' columns selected
(',paste(selectedColumns,collapse=' '),
      ')\n',sep='')
     invisible(return(out))
  }

# looking at phenotypic data associated with the sample expressionSet

> pData(expressionSet)
sex type score A Female Control 0.75 B Male Case 0.40 C Male Control 0.73 D Male Case 0.42 E Female Case 0.93 F Male Control 0.22 G Male Case 0.96 H Male Case 0.79 I Female Case 0.37 J Male Control 0.63 K Male Case 0.26 L Female Control 0.36 M Male Case 0.41 N Male Case 0.80 O Female Case 0.10 P Female Control 0.41 Q Female Case 0.16 R Male Control 0.72 S Male Case 0.17 T Female Case 0.74 U Male Control 0.35 V Female Control 0.77 W Male Control 0.27 X Male Control 0.98 Y Female Case 0.94 Z Female Case 0.32 # now the sample use
> (subset1 =subset(expressionSet,sex="Male",type="Control"))
7 columns selected (C F J R U W X) ExpressionSet (storageMode: lockedEnvironment) assayData: 500 features, 7 samples element names: exprs, se.exprs phenoData sampleNames: C, F, ..., X (7 total) varLabels and varMetadata description: sex: Female/Male type: Case/Control score: Testing Score featureData featureNames: AFFX-MurIL2_at, AFFX-MurIL10_at, ..., 31739_at (500 total) fvarLabels and fvarMetadata description: none experimentData: use 'experimentData(object)' Annotation: hgu95av2 # what I would like to allow in use: (subset2 = subset(expressionSet, sex=="Male", score > 0.75) # note the == instead of =
[[alternative HTML version deleted]] ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Thu 20 Dec 2007 - 14:19:46 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 20 Dec 2007 - 15:30:21 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.