RE: [R] problem with predict()

About this list Date view Thread view Subject view Author view Attachment view

From: Czerminski, Ryszard (ryszard@arqule.com)
Date: Fri 28 Jun 2002 - 23:27:39 EST


Message-id: <EDE2901A3801144496F3485FCFEC8D3027C4D8@EPOCH.arqule.com>

This time I use the same file for train.data and test.data
throwing in "names(test) <- names(train)" before predict() for double
protection (:-)
and it still fails...

Is it some weird problem with this particular data set ? or a bug ?
(why "subscript out of bounds" ?)

> rm(list=ls())
> train.data <- read.csv("train.csv", header = TRUE, row.names = "mol",
comment.char="")
> test.data <- read.csv("train.csv", header = TRUE, row.names = "mol",
comment.char="")
> yr <- train.data[,1]; xr <- train.data[,-1]
> xr <- scale(xr) # matrix <- scale(data.frame)
> x.center <- attr(xr, "scaled:center"); x.scale <- attr(xr, "scaled:scale")
> mask <- apply(xr, 2, function(x) any(is.na(x)))
> xr <- xr[,!mask] # rm NA's
> ys <- test.data[,1]; xs <- test.data[,-1]
> xs <- scale(xs, center = x.center, scale = x.scale)
> xs <- xs[,!mask]
> train <- data.frame(y = yr, x = xr)
> test <- data.frame(y = ys, x = xs)
> model <- lm(y~., train)
> cat("dim(train) =", dim(train), "; dim(test) =", dim(test), "\n")
dim(train) = 164 119 ; dim(test) = 164 119
> names(test) <- names(train)
> length(predict(model, test))
Error in drop(X[, piv, drop = FALSE] %*% beta[piv]) :
        subscript out of bounds
>

Ryszard Czerminski phone: (781)994-0479
ArQule, Inc. email:ryszard@arqule.com
19 Presidential Way http://www.arqule.com
Woburn, MA 01801 fax: (781)994-0679

-----Original Message-----
From: Liaw, Andy [mailto:andy_liaw@merck.com]
Sent: Friday, June 28, 2002 8:46 AM
To: 'Czerminski, Ryszard'
Cc: r-help@stat.math.ethz.ch
Subject: RE: [R] problem with predict()

You can try:

  names(test) <- names(train)

before calling predict() to make sure that the variable names match.
Without your data files, it's hard to tell why your first example worked.

Andy

> -----Original Message-----
> From: Czerminski, Ryszard [mailto:ryszard@arqule.com]
> Sent: Thursday, June 27, 2002 3:29 PM
> To: 'ripley@stats.ox.ac.uk'; Czerminski, Ryszard
> Cc: r-help@stat.math.ethz.ch
> Subject: RE: [R] problem with predict()
>
>
>
> # Yes. You are *still* using a matrix in a data frame.
> Please do read more
> # carefully.
>
> I have read some more R documentation trying to understand difference
> between
> matrices and data frames etc... and I repeat my example this time
> executing EXACTLY the same code with only difference being
> that in one case
> I use smaller data sets ({train,test}-small.csv) and in the
> second case I
> use larger
> data sets ({train,test}.csv) - and I got different behaviour.
>
> Small case (10*4) runs fine, larger case (164*119) fails.
>
> Any ideas why this happens ?
>
> R
>
> > rm(list=ls())
> > train.data <- read.csv("train-small.csv", header = TRUE, row.names =
> "mol", comment.char="")
> > test.data <- read.csv("test-small.csv", header = TRUE,
> row.names = "mol",
> comment.char="")
> > yr <- train.data[,1]; xr <- train.data[,-1]
> > xr <- scale(xr)
> > x.center <- attr(xr, "scaled:center"); x.scale <- attr(xr,
> "scaled:scale")
> > mask <- apply(xr, 2, function(x) any(is.na(x)))
> > xr <- xr[,!mask] # rm NA's
> > ys <- test.data[,1]; xs <- test.data[,-1]
> > xs <- scale(xs, center = x.center, scale = x.scale)
> > xs <- xs[,!mask]
> > train <- data.frame(y = yr, x = xr)
> > test <- data.frame(y = ys, x = xs)
> > model <- lm(y~., train)
> > cat("dim(train) =", dim(train), "; dim(test) =", dim(test), "\n")
> dim(train) = 10 4 ; dim(test) = 10 4
> > length(predict(model, test))
> [1] 10
> > train.data <- read.csv("train.csv", header = TRUE,
> row.names = "mol",
> comment.char="")
> > test.data <- read.csv("test.csv", header = TRUE, row.names = "mol",
> comment.char="")
> [snip...]
> > cat("dim(train) =", dim(train), "; dim(test) =", dim(test), "\n")
> dim(train) = 164 119 ; dim(test) = 35 119
> > length(predict(model, test))
> Error in drop(X[, piv, drop = FALSE] %*% beta[piv]) :
> subscript out of bounds
> >
>
> Ryszard Czerminski phone: (781)994-0479
> ArQule, Inc. email:ryszard@arqule.com
> 19 Presidential Way http://www.arqule.com
> Woburn, MA 01801 fax: (781)994-0679
>
>
> -----Original Message-----
> From: ripley@stats.ox.ac.uk [mailto:ripley@stats.ox.ac.uk]
> Sent: Friday, June 21, 2002 3:41 PM
> To: Czerminski, Ryszard
> Cc: r-help@stat.math.ethz.ch
> Subject: RE: [R] problem with predict()
>
>
> On Fri, 21 Jun 2002, Czerminski, Ryszard wrote:
>
> > --- first problem
> >
> > If I store 'simulated' data in data frames:
> > # train.data <- data.frame(matrix(rnorm(164*119), nrow = 164))
> > # test.data <- data.frame(matrix(rnorm(35*119), nrow = 35))
> > it still works the same way i.e. the code below works fine
> > for simulated data and fails for 'real' data the only
> > difference being in actual numeric values stored in data
> > structures of the same shape and type.
> >
> > Any suggestions why this happens ?
>
> Yes. You are *still* using a matrix in a data frame. Please
> do read more
> carefully.
>
> > --- second problem
> >
> > > As Andy Liaw pointed out, xr is a matrix. Take a look at
> the names of
> > > train. Hint: they do not contain `x'.
> >
> > Following your hint I am guessing that the fact that names
> do not contain
> > 'x'
> > explains why lm(y~., train) form works and lm(y~x, train) fails
> > and "lm(y~., train)" means roughly: correlate column "y" to
> all other
> colums
>
> No, it means regress y on all the remaining colums in the
> data argument.
>
> >
> > Where I can find more detail specification of this syntax ?
> > In help(lm) I find this paragraph:
> >
> > Models for `lm' are specified symbolically. A typical
> model has
> > the form `response ~ terms' where `response' is the
> (numeric)...
> >
> > which does not quite cover this case.
>
> In any good book on the subject.
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-.-.-.-.-.-.
> -.-
> r-help mailing list -- Read
> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To:
> r-help-request@stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> _._._._._._._.
> _._
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-.-.-.-.-.-.-.-
> r-help mailing list -- Read
> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To:
> r-help-request@stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> _._._._._._._._._
>

----------------------------------------------------------------------------

--
Notice: This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that
may be confidential, proprietary copyrighted and/or legally privileged, and
is intended solely for the use of the individual or entity named on this
message.  If you are not the intended recipient, and have received this
message in error, please immediately return this by e-mail and then delete
it.

============================================================================ == -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._


About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.3 : Wed 16 Oct 2002 - 11:57:34 EST