Re: [R] Are least-squares means useful or appropriate?

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Sat 24 Sep 2005 - 01:22:45 EST

Douglas Bates <dmbates@gmail.com> writes:

> On 9/20/05, Felipe <felipe@unileon.es> wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Hi.
> > My question was just theoric. I was wondering if someone who were using
> > SAS and R could give me their opinion on the topic. I was trying to use
> > least-squares means for comparison in R, but then I found some
> > indications against them, and I wanted to know if they had good basis
> > (as I told earlier, they were not much detailed).
> > Greetings.
> >
> > Felipe
>
> As Deepayan said in his reply, the concept of least squares means is
> associated with SAS and is not generally part of the theory of linear
> models in statistics. My vague understanding of these (I too am not a
> SAS user) is that they are an attempt to estimate the "mean" response
> for a particular level of a factor in a model in which that factor has
> a non-ignorable interaction with another factor. There is no clearly
> acceptable definition of such a thing.

(PD goes and fetches the SAS manual....)

Well, yes. it'll do that too, although only if you ask for the lsmeans of A when an interaction like A*B is present in the model. This is related to the tests of main effects when an interaction is present using type III sums of squares, which has been beaten to death repeatedly on the list. In both cases, there seems to be an implicit assumption that categorical variables by nature comes from an underlying fully balanced design.

If the interaction is absent from the model, the lsmeans are somewhat more sensible in that they at least reproduce the parameter estimates as contrasts between different groups. All continuous variables in the design will be set to their mean, but values for categorical design variables are weighted inversely as the number of groups. So if you're doing an lsmeans of lung function by smoking adjusted for age and sex you get estimates for the mean of a population of which everyone has the same age and half are male and half are female. This makes some sense, but if you do it for sex adjusting for smoking and age, you are not only forcing the sexes to smoke equally much, but actually adjusting to smoking rates of 50%, which could be quite far from reality.

The whole operation really seems to revolve around 2 things:

(1) pairwise comparisons between factor levels. This can alternatively

    be done fairly easily using parameter estimates for the relevant     variable and associated covariances. You don't really need all the     mumbo-jumbo of adjusting to particular values of other variables.

(2) plotting effects of a factor with error bars as if they were

    simple group means. This has some merit since the standard     parametrizations are misleading at times (e.g. if you choose the     group with the least data as the reference level, std. err. for     the other groups will seem high). However, it seems to me that     concepts like floating variances (see float() in the Epi package)     are more to the point.

> R is an interactive language where it is a simple matter to fit a
> series of models and base your analysis on a model that is
> appropriate. An approach of "give me the answer to any possible
> question about this model, whether or not it make sense" is
> unnecessary.
>
> In many ways statistical theory and practice has not caught up with
> statistical computing. There are concepts that are regarded as part
> of established statistical theory when they are, in fact,
> approximations or compromises motivated by the fact that you can't
> compute the answer you want - except now you can compute it. However,
> that won't stop people who were trained in the old system from
> assuming that things *must* be done in that way.
>
> In short, I agree with Deepayan - the best thing to do is to ask
> someone who uses SAS and least squares means to explain to you what
> they are.
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
   O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Sat Sep 24 01:26:15 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:40:26 EST