From: Greg Snow <Greg.Snow_at_imail.org>

Date: Thu, 20 Mar 2008 11:21:33 -0600

Date: Thu, 20 Mar 2008 11:21:33 -0600

First run a regular lm command without the restrictions, but specify y=TRUE, x=TRUE.

You could also accomplish the same idea in the original regression using a formula like:

Y ~ I( fac1=='A' + fac2=='A' ) + I( fac1=='B' + fac2=='B' ) + ...

-- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow_at_imail.org (801) 408-8111Received on Thu 20 Mar 2008 - 17:25:47 GMT

> -----Original Message-----

> From: r-help-bounces_at_r-project.org> [mailto:r-help-bounces_at_r-project.org] On Behalf Of Andres Legarra> Sent: Thursday, March 20, 2008 2:25 AM> To: Michael Dewey> Cc: R-help_at_r-project.org> Subject: Re: [R] two cols in a data frame are the same factor>> Hi,> I am afraid you misunderstood it. I do not have repeated> records, but for every record I have two, possibly different,> simultaneously present, instanciations of an explanatory variable.>> My data is as follows :>> yield haplo1 haplo2> 100 A B> 151 B A> 212 A A>> So I have one effect (haplo), but two copies of each affect "yield".> If I use lm() I get:> >> a=data.frame(yield=c(100,151,212),haplo1=c("A","B","A"),haplo2=c("B","> > A","A"))> Call:> lm(formula = yield ~ -1 + haplo1 + haplo2, data = a)>> Coefficients:> haploA haploB haplo2B> 212 151 -112>>> But I get different coefficients for the two "A"s (in fact oe> was set to 0) and the Two "Bs" . That is, the model has four> unknowns but in my example I have just two!>> A least-squares solution is simple to do by hand:>> X=matrix(c(1,1,1,1,2,0),ncol=2) #the incidence matrix> > X> [,1] [,2]> [1,] 1 1> [2,] 1 2> [3,] 1 0> > solve(crossprod(X,X),crossprod(X,a$yield))> [,1]> [1,] 184.8333> [2,] -30.5000>> where [1,] is the solution for A and [2,] is the solution for B>> This is not difficult to do by hand, but it is for a simple> case and I miss all the machinery in lm()>> Thank you> Andres>> On Wed, Mar 19, 2008 at 6:57 PM, Michael Dewey> <info_at_aghmed.fsnet.co.uk> wrote:> > At 09:11 18/03/2008, Andres Legarra wrote:> > >Dear all,> > >I have a data set (QTL detection) where I have two cols> of factors> > in >the data frame that correspond logically (in my model) to the> > same >factor. In fact these are haplotype classes.> > >Another real-life example would be family gas consumption as a> > >function of car company (e.g. Ford, GM, and Honda)> (assuming 2 cars> > by >family).> >> > Unless I completely misunderstand this it looks like you have the> > dataset in wide format when you really wanted it in long> format (to> > use the terminology of ?reshape). Then you would fit a> model allowing> > for the clustering by family.> >> >> >> >> > >An artificial example follows:> > >set.seed(1234)> > >L3 <- LETTERS[1:3]> > >(d <- data.frame( y=rnorm(10), fac=sample(L3, 10,> > >repl=TRUE),fac1=sample(L3,10,repl=T)))> > >> > > lm(y ~ fac+fac1,data=d)> > >> > >and I get:> > >> > >Coefficients:> > >(Intercept) facB facC fac1B fac1C> > > 0.3612 -0.9359 -0.2004 -2.1376 -0.5438> > >> > >However, to respect my model, I need to constrain effects> in fac and> > >fac1 to be the same, i.e. facB=fac1B and facC=fac1C. There are> > >logically just 4 unknowns (average,A,B,C).> > >With continuous covariates one might do y ~ I(cov1+cov2),> but this> > is >not the case.> > >> > >Is there any trick to do that?> > >Thanks,> > >> > >Andres Legarra> > >INRA-SAGA> > >Toulouse, France> >> > Michael Dewey> > http://www.aghmed.fsnet.co.uk> >> >>> ______________________________________________> R-help_at_r-project.org mailing list> https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide> http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.>

______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Fri 21 Mar 2008 - 08:30:23 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*