[R] Cannot grasp how to apply "by" here...

From: Jonas Malmros <jonas.malmros_at_gmail.com>
Date: Mon, 17 Dec 2007 19:47:46 +0100

I have a data frame named "database" with panel data, a little piece of which looks like this:

  Symbol Name Trial Factor1 Factor2    External

1 548140                 A                  1            -3.87

-0.32 0.01
2 547400 B 1 12.11
-0.68 0.40
3 547173 C 1 4.50 0.71 -1.36 4 546832 D 1 2.59 0.00 0.09 5 548140 A 2 2.41 0.50 -1.04 6 547400 B 2 1.87 0.32 0.39

What I want to do is to calculate correlation between each factor and external for each Symbol, and record the corr. estimate, the p.value, the name and number of observations in a vector named "vector", then rbind these vectors together in "results". When there are fewer than 5 observations for a particular symbol I want to put NAs in each column of "vector".

I tried with the following code, making assumption that by splits database into sort of smaller dataframes for each Symbol (that's the "x"):

factor.names <- c("Factor1", "Factor2")
factor.pvalue <- c("SigF1", "SigF2")
results <- numeric()

vector <- matrix(0, ncol=(length(factor.names)*2+2), nrow=1) colnames(vector) <- c("No.obs", factor.names, factor.pvalue)

application <- function(x){

    rownames(vector) <- x$Name

    for(i in 1:length(factor.names)){

            vector[1] <- dim(x)[1]
            vector[i+1] <- cor.test(x$External, x[,factor.names[i]],
            vector[i+3] <- cor.test(x$External, x[,factor.names[i]],
        } else {
            vector <- rep(NA, length(vector))

    results <- rbind(results, vector)

by(database, database$Symbol, application)

This did not work. I get :
"Error in dimnames(x) <- dn :
  length of 'dimnames' [1] not equal to array extent"

I used browser() and I see that the Name is not assigned to the row name of vector and then dim(x)[1] does not work.

What am I doing wrong? Do not understand. :-(

Thank you in advance for your help.



Jonas Malmros
Stockholm University
Stockholm, Sweden

