# [R] Effect of data set size on calculation

Date: Thu 08 Sep 2005 - 12:21:28 EST

Dear listers,

I have a piece of code which performs an ANOVA type of analysis on 2D GC data. The code is shown below:

# ANOVA 2D GC analysis

# maxc <- number of samples

# nreps <- number of samples

maxc <- 2

nreps <- 4

sscl <- NULL

cmean <- NULL

#

# Initial stat. variable

#

dftot <- nrow(mat)-1

dfcl <- maxc - 1

dferr <- dftot - dfcl

totmean <- mean(mat)

sstot <- sd(mat)^2*dftot

#

# Calculate class-to-class variance

#

for (j in 1:maxc) {

cmean <- rbind(cmean,mean(mat[((j-1)*nreps+1):((j-1)*nreps+nreps),]))

}

for (j in 1:ncol(mat)) {

cmean[,j] <- cmean[,j]-totmean[j]

}

cmean <- (cmean)^2*nreps

for (i in 1:ncol(mat)) {

sscl[i] <- sum(cmean[,i])

}

#

# sserr <- sstot-sscl

#

ratios <- (sscl/dfcl)/((sstot-sscl)/dferr)

I have tested the above on a small data set (based on average on the second dimension) and produced a result which was meaningful. However, when I analyse data with both dimensions (larger dataset), the analysis is not successful. I've narrowed the problem down to the calculation for cmean but I have no idea why there is a problem. If anyone has any suggestions then feel free to comment. Relevant output is given below.

Many thanks, Peter.

# Averaged dataset

> ncol(mat)

 636

> nrow(mat)

 8

[SNIP]
> for (j in 1:maxc) {

+ cmean <- rbind(cmean,mean(mat[((j-1)*nreps+1):((j-1)*nreps+nreps),]))

+ }

> cmean

V2 V3 V4 V5 V6 V7 V8 V9

[1,] 27.38970 27.68816 27.80730 27.72688 27.68044 27.33749 6667.038
15537.47

[2,] 26.36001 26.72920 26.64940 26.82506 26.54539 26.30811 8029.746
13656.60

... [SNIP]            V634 V635 V636 V637

[1,] 27.51868 27.51270 27.52344 27.52127

[2,] 26.45830 26.45837 26.46089 26.46407

>

# Full dataset

> nrow(mat)

 8

> ncol(mat)

 390010

[SNIP]
> for (j in 1:maxc) {

+ cmean <- rbind(cmean,mean(mat[((j-1)*nreps+1):((j-1)*nreps+nreps),]))

+ }

> cmean

[,1]

[1,] 54.48274

[2,] 63.14705

>

[[alternative HTML version deleted]]

R-help@stat.math.ethz.ch mailing list