# Re: [R] Mimicking SPSS weighted least squares

From: Peter Dalgaard <P.Dalgaard_at_biostat.ku.dk>
Date: Tue, 11 Mar 2008 16:26:32 +0100

John Fox wrote:
> Dear JRG, Rolf, Ben, and Peter,
>
> "Frequency" weights, possibly even non-integer weights, are useful for
> surveys where observations are sampled with unequal probabilities of
> selection. The approach in SPSS gives correct point estimates in this
> situation but incorrect standard errors. The survey package, for example,
> provides a better solution.
>
> Regards,
> John
>
Actually, I count this as a 3rd variant of weighting. I believe that SPSS 's standard errors are actually OK for the case where one data line actually represents a number of identical replicates. To my mind, there are three (main) kinds of weighting:

```(1) Variance weighting (weights proportional to inverse variances)
(2) Case weights (weights identical to number of replicates)
(3) Inverse probability weights (weights inversely proportional to
```
sampling freq.)

All three give the same point estimates, beta=inv(X'WX)X'WY but the SEs and DF are different (W is the diagonal matrix of weights). I think the formulas are as follows (please correct if I goofed):

in (1) you get sigma^2=Y'(W-WX' inv(X'WX)X'W)Y/(n-rank(X)) ,

```                        VCOV= sigma^2 inv(X'WX),

in (3) it is sigma^2=Y'(I-WX inv(X'WX)X') (I- X inv(X'WX)X'W)Y/(n-rank(X)),
VCOV=sigma^2 inv(X'WX) X'WWX inv(X'WX)

```

in both these cases, the DF are n-rank(X) (glossing over complications that arise when the weights become zero) and the VCOV are stable to proportional scaling of W.

in (2) you get sigma^2=Y'(W-WX' inv(X'WX)X'W)Y/(tr(W)-rank(X)),

```                         VCOV= sigma^2 inv(X'WX),

```

This is deceptively similar to (1), but notice the denominator of sigma^2. In this case, multiplying the weights by, say, 2 will roughly halve the VCOV, which is fair enough since it means that you have twice as much data.
> --------------------------------
> John Fox, Professor
> Department of Sociology
> McMaster University
> Hamilton, Ontario, Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox
>
>
>
>> -----Original Message-----
>> From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-
>> project.org] On Behalf Of JRG
>> Sent: March-10-08 10:27 PM
>> To: Rolf Turner; r-help_at_r-project.org; Ben Domingue
>> Cc: r-help_at_r-project.org
>> Subject: Re: [R] Mimicking SPSS weighted least squares
>>
>> On 11 Mar 2008 at 14:09, Rolf Turner wrote:
>>
>>
>>> It would appear that the SPSS procedure would then give exactly the
>>>
>> same
>>
>>> point estimates of the parameters, and change the inference structure
>>>
>> by
>>
>>> changing the ``denominator degrees of freedom'' from n-p to sum(w) -
>>>
>> p.
>>
>> Well, if that IS what SPSS does, then it sounds like what Stata calls
>> frequency weights, the
>> general idea being that each "observation" in fact represents some non-
>> negative number (w) of
>> actual observations that have identical values. Not much more than a
>> glorified version of a
>> frequency distribution table.
>>
>> I don't see anything fundamentally wrong with frequency weights, given
>> an appropriate situation.
>>
>> ---JRG
>>
>> John R. Gleason
>>
>>
>>
>>
>>> This seems to me to make little sense ... But then, it ***is***
>>> SPSS. :-)
>>>
>>> cheers,
>>>
>>> Rolf
>>>
>>> On 11/03/2008, at 11:35 AM, Peter Dalgaard wrote:
>>>
>>>
>>>> Rolf Turner wrote:
>>>>
>>>>> On 11/03/2008, at 4:04 AM, Ben Domingue wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Howdy,
>>>>>> In SPSS, there are 2 ways to weight a least squares regression:
>>>>>> 1. You can do it from the regression menu.
>>>>>> 2. You can set a global weight switch from the data menu.
>>>>>> These two options have no, in my experience, been equivalent.
>>>>>> Now, when I run lm in R with the weights= switch set accordingly,
>>>>>>
>> I
>>
>>>>>> get the same set of results you would see with option #1 in SPSS.
>>>>>> Does anybody know how to duplicate option #2 from SPSS in R?
>>>>>>
>>>>>>
>>>>> I think it's up to you to find out what ``option #2 from SPSS''
>>>>> actually
>>>>> *does*. If you know that, then you can (with a modicum of effort)
>>>>> duplicate that option in R. The help file for lm() tells you that
>>>>> R uses the weights by minimizing sum(w*e^2) where w = weights and
>>>>> e = ``errors'' or residuals.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> I believe case weighting in SPSS effectively replicates the
>>>> relevant row (not sure if anything sensible comes out if weights
>>>> are non-integer). So
>>>>
>>>> lm(...., data=mydata[rep(1:nrow(mydata),w),])
>>>>
>>>> or thereabouts should do it. Might not be too efficient though.
>>>>
>>>> --
>>>> O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
>>>> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
>>>> (*) \(*) -- University of Copenhagen Denmark Ph: (+45)
>>>> 35327918
>>>> ~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk) FAX: (+45)
>>>> 35327907
>>>>
>>>>
>>>>
>>>
>> ######################################################################
>>
>>> Attention:
>>> This e-mail message is privileged and confidential. If you are not
>>>
>> the
>>
>>> intended recipient please delete the message and notify the sender.
>>> Any views or opinions presented are solely those of the author.
>>>
>>> This e-mail has been scanned and cleared by MailMarshal
>>> www.marshalsoftware.com
>>>
>>>
>> ######################################################################
>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>
>> guide.html
>>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

```--
O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk)              FAX: (+45) 35327907

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help