[R] simprof test using jaccard distance

From: <Maia.Berman_at_csiro.au>
Date: Tue, 17 May 2011 18:03:29 +1000


Dear All,
I would like to use the simprof function (clustsig package) but the available distances do not include Jaccard distance, which is the most appropriate for pres/abs community data. Here is the core of the function:
> simprof

function (data, num.expected = 1000, num.simulated = 999, method.cluster = "average",

    method.distance = "euclidean", method.transform = "identity",     alpha = 0.05, sample.orientation = "row", const = 0, silent = TRUE,     increment = 100)
{

    if (!is.matrix(data))

        data <- as.matrix(data)
    rawdata <- data
    if (sample.orientation == "column")

        rawdata <- t(rawdata)
    if (is.function(method.distance))

        rawdata.dist <- method.distance(rawdata)     else if (method.distance == "braycurtis") {

        rawdata.dist <- braycurtis(rawdata, const)
        if (!is.null(rownames(rawdata)))
            attr(rawdata.dist, "Labels") <- rownames(rawdata)
    }
    else rawdata.dist <- dist(rawdata, method = method.distance)     if (!method.transform == "identity")

        rawdata <- trans(rawdata, method.transform)

    hclust.results <- hclust(rawdata.dist, method = method.cluster)
    pMatrix <- cbind(hclust.results$merge, matrix(data = NA,
        nrow = nrow(hclust.results$merge), ncol = 1))
    simprof.results <- simprof.body(rawdata = rawdata, num.expected = num.expected,
        num.simulated = num.simulated, method.cluster = method.cluster,
        method.distance = method.distance, originaldata = rawdata,
        alpha = alpha, clust.order = hclust.results$merge, startrow = nrow(hclust.results$merge),
        pMatrix = pMatrix, const = const, silent = silent, increment = increment)
    results <- list()
    results[["numgroups"]] <- length(simprof.results$samples)
    results[["pval"]] <- simprof.results$pval
    results[["hclust"]] <- hclust.results
    if (!is.null(rownames(rawdata))) {
        for (i in 1:length(simprof.results$samples)) {
            for (j in 1:length(simprof.results$samples[[i]])) {
                simprof.results$samples[[i]][j] <- rownames(rawdata)[as.numeric(simprof.results$samples[[i]][j])]
            }
        }

    }
    results[["significantclusters"]] <- simprof.results$samples     return(results)
}
<environment: namespace:clustsig>

I tried to trick the function by bypassing the input of a raw community matrix (sites x species) by inputing instead a distance matrix already calculated with vegdist (vegan package) (Jaccard distance is available in vegdist, but not in the function dist used in simprof...) I therefore modified the function as follow:

simprof2=function (data, num.expected = 1000, num.simulated = 999, method.cluster = "average",

    alpha = 0.05, sample.orientation = "row", const = 0, silent = TRUE,     increment = 100)
{ hclust.results <- hclust(data, method = method.cluster)

    pMatrix <- cbind(hclust.results$merge, matrix(data = NA,

        nrow = nrow(hclust.results$merge), ncol = 1))     simprof.results <- simprof.body(data = data, num.expected = num.expected,

        num.simulated = num.simulated, method.cluster = method.cluster,originaldata = data,
        alpha = alpha, clust.order = hclust.results$merge, startrow = nrow(hclust.results$merge),
        pMatrix = pMatrix, const = const, silent = silent, increment = increment)
    results <- list()
    results[["numgroups"]] <- length(simprof.results$samples)
    results[["pval"]] <- simprof.results$pval
    results[["hclust"]] <- hclust.results
    if (!is.null(rownames(data))) {
        for (i in 1:length(simprof.results$samples)) {
            for (j in 1:length(simprof.results$samples[[i]])) {
                simprof.results$samples[[i]][j] <- rownames(data)[as.numeric(simprof.results$samples[[i]][j])]
            }
        }

    }
    results[["significantclusters"]] <- simprof.results$samples     return(results)
}
But upon running it on my vegdist object, I get the following error:
> simprof2(G3_jaccard)

Error in simprof2(G3_jaccard) : could not find function "simprof.body"

I guess this function is part of the initial environment in which simprof runs, but how do I access it in order to call it when I run my own function outside of the initial environment? Another possible but maybe more complex way would be to define the jaccard distance (as J= 2B/(1+B) B beeing BrayCurtis distance already used in the initial function)

Many thanks in advance for your help!

--
Maļa Berman, MSc
PhD candidate




	[[alternative HTML version deleted]]


______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Received on Tue 17 May 2011 - 09:15:43 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 18 May 2011 - 02:00:07 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive