From: Andres Legarra <legarra_at_gmail.com>

Date: Thu, 20 Mar 2008 09:25:00 +0100

[1,] 184.8333

[2,] -30.5000

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 20 Mar 2008 - 08:28:40 GMT

Date: Thu, 20 Mar 2008 09:25:00 +0100

Hi,

I am afraid you misunderstood it. I do not have repeated records, but
for every record I have two, possibly different, simultaneously
present, instanciations of an explanatory variable.

My data is as follows :

yield haplo1 haplo2

100 A B

151 B A

212 A A

So I have one effect (haplo), but two copies of each affect "yield".
If I use lm() I get:

*> a=data.frame(yield=c(100,151,212),haplo1=c("A","B","A"),haplo2=c("B","A","A"))
*

Call:

lm(formula = yield ~ -1 + haplo1 + haplo2, data = a)

Coefficients:

haploA haploB haplo2B

212 151 -112

But I get different coefficients for the two "A"s (in fact oe was set to 0) and the Two "Bs" . That is, the model has four unknowns but in my example I have just two!

A least-squares solution is simple to do by hand:

X=matrix(c(1,1,1,1,2,0),ncol=2) #the incidence matrix

*> X
*

[,1] [,2]

[1,] 1 1 [2,] 1 2 [3,] 1 0

> solve(crossprod(X,X),crossprod(X,a$yield))

[,1]

[1,] 184.8333

[2,] -30.5000

where [1,] is the solution for A and [2,] is the solution for B

This is not difficult to do by hand, but it is for a simple case and I miss all the machinery in lm()

Thank you

Andres

On Wed, Mar 19, 2008 at 6:57 PM, Michael Dewey <info_at_aghmed.fsnet.co.uk> wrote:

> At 09:11 18/03/2008, Andres Legarra wrote:

*> >Dear all,
**> >I have a data set (QTL detection) where I have two cols of factors in
**> >the data frame that correspond logically (in my model) to the same
**> >factor. In fact these are haplotype classes.
**> >Another real-life example would be family gas consumption as a
**> >function of car company (e.g. Ford, GM, and Honda) (assuming 2 cars by
**> >family).
**>
**> Unless I completely misunderstand this it looks like you have the
**> dataset in wide format when you really wanted it in long format (to
**> use the terminology of ?reshape). Then you would fit a model allowing
**> for the clustering by family.
**>
**>
**>
**>
**> >An artificial example follows:
**> >set.seed(1234)
**> >L3 <- LETTERS[1:3]
**> >(d <- data.frame( y=rnorm(10), fac=sample(L3, 10,
**> >repl=TRUE),fac1=sample(L3,10,repl=T)))
**> >
**> > lm(y ~ fac+fac1,data=d)
**> >
**> >and I get:
**> >
**> >Coefficients:
**> >(Intercept) facB facC fac1B fac1C
**> > 0.3612 -0.9359 -0.2004 -2.1376 -0.5438
**> >
**> >However, to respect my model, I need to constrain effects in fac and
**> >fac1 to be the same, i.e. facB=fac1B and facC=fac1C. There are
**> >logically just 4 unknowns (average,A,B,C).
**> >With continuous covariates one might do y ~ I(cov1+cov2), but this is
**> >not the case.
**> >
**> >Is there any trick to do that?
**> >Thanks,
**> >
**> >Andres Legarra
**> >INRA-SAGA
**> >Toulouse, France
**>
**> Michael Dewey
**> http://www.aghmed.fsnet.co.uk
**>
**>
*

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 20 Mar 2008 - 08:28:40 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 20 Mar 2008 - 18:30:24 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*