Re: [R] Help in using PCR

From: Gavin Simpson <gavin.simpson_at_ucl.ac.uk>
Date: Tue, 01 Jul 2008 16:33:07 +0100

On Wed, 2008-07-02 at 00:58 +1000, Jason Lee wrote:
> Hi,
>
> Thanks for the reply.
>
> Basically I dont have any label for my data except column 1 which is labeled
> with Sample1, Sample2...etc...(vertical).

Well that is going to cause you problems, not the one you report below, but it will bite you in the ass once you sort your R usage problems out. How did you read in your data. If using read.table or it's stable mates (read.csv etc) then read ?read.table and look at argument row.names to see how you can get those sample labels as the rownames for the data frame.

What does names(cancerv1) tell you? Unless you specifically deleted the column names then there will be some as R will have generated them upon reading your data in.

> My cancerv1 data is data.frame.
>
> So, I used df <- data.frame( x=I(coef(cancerv1(,2:407))),
> y=cancerv1[,408])before feeding to PCR.

Why? What are you trying to achieve? Why apply I() and coef() to these?

>
> However, I get the below error.
>
> Error in coef(cancerv1(, 2:407)) : could not find function "cancerv1".

cancerv1(, 2:407) - you have the wrong "brackets" there. Those are parentheses and denote an R function. You want brackets "[" "]" instead.

>
> I wonder what mistakes did I made in thiscase. My response variable is on
> column 408 and my predictors are from column 2 to 407.

Ok, lets sort this out. [Not tested as I don't have your data]

df <- data.frame(resp = cancerv1[, 408],

                 VARS = as.matrix(cancerv1[, 2:407])
mod <- pcr(resp ~ VARS , ncomp = 6, data = df, validation = "CV")

Which will get it in a format similar to the yarns example that you mentioned in your original post.

Now, if you sort out your data import issue (see above), you'll need to change the numbers in the square brackets above - they'll be 1 less than I have them up there.

An alternative, along the lines of my response:

df <- cancerv1[, -1]
## add some (col)names
names(df) <- c("resp", paste("Var", 1:(ncol(df)-1), sep = "")) names(df)
mod2 <- pcr(resp ~ . , ncomps = 6, data = df, validation = "CV")

Note Bjorn-Helge's comment about this latter approach taking a while to process the formula if you start using this on data sets with many more than 1000 predictor variables.

Does this help you any?

>
> Please advise. Thanks.

You do seem to be blundering about with R a bit ;-) Randomly trying functions and other R code is just going to frustrate you. Do help yourself and read some of the introductory documentation.

HTH G

>
> On Tue, Jul 1, 2008 at 6:41 PM, Bjrn-Helge Mevik <b.h.mevik_at_usit.uio.no>
> wrote:
>
> > Gavin Simpson <gavin.simpson_at_ucl.ac.uk> writes:
> >
> > > You can do this another way though, that I feel is more natural. So lets
> > > assume that your data frame contains columns that are named, and that
> > > one of these is the response variable, the remaining columns are the
> > > predictors. Further assume that this response is called 'myresp', then
> > > you can proceed by the following:
> > >
> > > cancerv1.pcr <- pcr(myresp ~ . , ncomp = 6, data = cancerv1,
> > > validation = "CV")
> >
> > This works fine as long as the number of (predictor) variables is not
> > too large. With many variables (>> 1000), R will spend a very long time
> > dealing with the formula.
> >
> > --
> > Bjrn-Helge Mevik
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 01 Jul 2008 - 17:42:50 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 02 Jul 2008 - 09:31:00 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive