Re: Re: [R] RE: more on lm(y~x) question: removing NAs

About this list Date view Thread view Subject view Author view Attachment view

From: Thomas Lumley (tlumley@u.washington.edu)
Date: Wed 05 May 2004 - 01:30:39 EST


Message-id: <Pine.A41.4.58.0405040808200.152504@homer38.u.washington.edu>

On Tue, 4 May 2004, Christoph Scherber wrote:

> it all works fine (the regression lines fit correctly to the data) as
> long as there are not both missing values in j and k.

That's very strange. The lines
 for (k in 1:length(foranalysis[93:174,i]))
     number[k]_substring(plotcode[foranalysis[k,1]],1,5)

should set result in k being the scalar value 81 after the loop is over.
In R (unlike S-PLUS), loop indices are just ordinary variables in the
environment where the loop is executed. I'd expect this code to work in
S-PLUS but not in R.

That loop is actually redundant, since substring() is vectorised:
        number <- substring(plotcode[foranalysis[93:174,1]],1,5)
should work just as well.

It's also strange that you create a data frame df from j and k but don't
use it in the lm() call (or AFAICS anywhere else).

>
> What suggestions would you have for this? Or, more precisely, how would
> you create multiple graphs from subsequent columns of a data.frame?

I'd probably use lsfit. The following is obviously not tested, since I
don't have the data (or even understand fully the data layout).

L <- length(93:174)
for(i in p) {
        X<-foranalysis[93:174, i]
        Y<-foranalysis[93:174, i+1]
        corr<-cor(X,Y)
        corrtrunc<-cor(X[X<0.9], Y[X<0.9])
        mainlab <- paste(substring(names(foranalysis[i]), 2, 8),
                        "; corr.:", corr,
                        ";excl.Mono", corrtrunc))
        plot(X,Y,main=mainlab,
                xlab="% of total biomass",ylab="% of total cover",pch="n")
        number <- substring(plotcode[foranalysis[1:L,1]], 1, 5)
        text(X, Y, number)
        model <- lsfit(X,Y)
        abline(model)
        abline(0, 1, lty=2)
    }

        -thomas

> >>>
> >>>par(mfrow=c(5,5))
> >>>p_seq(3,122,2)
> >>>i_0
> >>>k_0
> >>>number_0
> >>>for (i in p) {
> >>> j_foranalysis[93:174,i+1]
> >>> k_foranalysis[93:174,i]
> >>> df_data.frame(j,k)
> >>> mainlab1_substring(names(foranalysis[i]),2,8)
> >>> mainlab2_"; corr.:"
> >>> mainlab3_round(cor(j,k,na.method="available"),4)
> >>> mainlab4_"; excl.Mono:"
> >>> mainlab5_round(cor(j[j<0.9],k[j<0.9],na.method="available"),4)
> >>> mainlab_paste(mainlab1,mainlab2,mainlab3,mainlab4,mainlab5)
> >>> plot(k,j,main=mainlab,xlab="% of total biomass",ylab="% of total
> >>>cover",pch="n")
> >>> for (k in 1:length(foranalysis[93:174,i]))
> >>>number[k]_substring(plotcode[foranalysis[k,1]],1,5)
> >>> text(foranalysis[93:174,i],foranalysis[93:174,i+1],number)
> >>>**********************************
> >>> model_lm(j~k,na.action=na.exclude])
> >>>**********************************
> >>> abline(model)
> >>> abline(0,1,lty=2)
> >>> }
> >>>
> >>>Does anyone have any suggestions on this?
> >>>
> >>>Best regards
> >>>Chris.,
> >>>
> >>>
> >>>
> >>>
> >>>Liaw, Andy wrote:
> >>>
> >>>
> >>>
> >>>>By (`factory') default that's done for you automagically, because
> >>>>options("na.action") is `na.omit'.
> >>>>
> >>>>If you really want to do it `by hand', and have the data in
> >>>>
> >>>>
> >>>a data frame,
> >>>
> >>>
> >>>>you can use something like:
> >>>>
> >>>>lm(y ~ x, df[complete.cases(df),])
> >>>>
> >>>>HTH,
> >>>>Andy
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>From: Christoph Scherber
> >>>>>
> >>>>>Dear all,
> >>>>>
> >>>>>I have a data frame with different numbers of NAs in each
> >>>>>column, e.g.:
> >>>>>
> >>>>>x y
> >>>>>1 2
> >>>>>NA 3
> >>>>>NA 4
> >>>>>4 NA
> >>>>>1 5
> >>>>>NA NA
> >>>>>
> >>>>>
> >>>>>I now want to do a linear regression on y~x with all the NAs
> >>>>>removed.
> >>>>>The problem now is that is.na(x) (and is.na(y) obviously
> >>>>>gives vectors
> >>>>>with different lengths. How could I solve this problem?
> >>>>>
> >>>>>Thank you very much for any help.
> >>>>>
> >>>>>Best regards
> >>>>>Chris
> >>>>>
> >>>>>
> >>>>>
> >>______________________________________________
> >>R-help@stat.math.ethz.ch mailing list
> >>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >>
> >>
> >>
> >
> >Thomas Lumley Assoc. Professor, Biostatistics
> >tlumley@u.washington.edu University of Washington, Seattle
> >
> >
> >
>
>

Thomas Lumley Assoc. Professor, Biostatistics
tlumley@u.washington.edu University of Washington, Seattle

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.3 : Mon 31 May 2004 - 23:05:07 EST