[R] Using the 'by' function within a 'for' loop

From: Judith Flores <juryef_at_yahoo.com>
Date: Tue, 22 Apr 2008 10:30:57 -0700 (PDT)


Dear R experts,

    I am sorry for sending this email again. I would imagine yesterday and maybe today, have been very busy days with the release of R v 2.7.0. I join all the R users who are very gratful for your contant work and efforts, specially knowing that you are doing this for the sake of science, without gettig any compensation for that.

    Having written that, I decided to send the email below again, in case it was forgotten; or maybe I am missing something very basic?

   I am using version 2.7.0, in windows XP.

Start of yesterday's email:

    I am trying to optimize my script, because right now it requires a lot of memory. The goal is to generate four plots in one page. Every plot corresponds to the means and sem's calculated for a given variable at different days. In order to obtain the means and sem's I apply the 'by' function. The way I have done it so far is like this:

Read the data
Generate a summary of the mean and sem of a variable at every Day.
Plot the mean and sem of that variable.

Repeat the same process for the other 3 variables.

  I tried to optimize the code by using a for loop, the code is below.   

#Reading the data

dato<-read.csv('mydata.csv')
names(dato)<-c("id","day","tx","var1","var2","var3","var4") dato<-dato[,1:7]

#Specify varible to be plotted

variable<-c('var1','var2','var3','var4')

#Define parameters of window where panel: margins,
number of plots in the panel
windows(height=9, width=9, rescale='fixed') par(mfrow=c(2,2),xpd=T, bty='l',
omi=c(0.8,0.25,1.2,0.15), mai=c(1.1,0.8,0.3,0.3))

for (k in variable) {     

    dat<-dato[!is.na(k),]

    summ<-by(dat,dat[,c("tx","day")], function(x) {

        mn<-mean(x$k)
        std<-sd(x$k)
        n<-length(x$k)
        se<-std/sqrt(n)
        lowb<-mn-se
        upb<-mn+se
       
data.frame(tx=x$tx[1],day=x$day[1],mn=mn,std=std,lowb=lowb,upb=upb,se=se)
        })

    summ<-do.call("rbind",summ)        

    #Definining x axis range
    xmax<-unique(max(summ$day,na.rm=TRUE))     xmin<-unique(min(summ$day,na.rm=TRUE))     

    yaxmin<-unique(min(summ$lowb))
    yaxmax<-unique(max(summ$upb))

plot(1,1,type='n',xlab='Day',xlim=c(xmin,xmax),ylim=c(yaxmin,yaxmax), ylab=k,        

las=1,cex.lab=1,xaxp=c(xmin,xmax,diff(range(c(xmin,xmax)))))

        points(summ$day,summ$mn)

}        
        



    Where variable is a vector that specifies all the variables I want to plot.

But I am getting the following error:

“Error in var(as.vector(x), na.rm = na.rm) : 'x' is empty
In addition: Warning message:
In mean.default(x$k) : argument is not numeric or logical: returning NA”

   Could some one please show me how to structure my code to achieve my final goal, which is to simplify it?

I am attaching a csv file in case you want to run my code.

Thank you very much in advance for your time and help,

Judith



Be a better friend, newshound, and



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 22 Apr 2008 - 17:40:01 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 22 Apr 2008 - 21:30:33 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive