Re: [R] selecting rows for inclusion in lm

From: John Sorkin <jsorkin_at_grecc.umaryland.edu>
Date: Thu 18 Jan 2007 - 22:23:21 GMT


I must express thanks to Peter Konings, Gary Collins, David Barron, Prof. Brian Ripley, Vladimir Eremeev, and Michael Dewey (I hope I did not leave anyone out) all of whom suggested I used the subset parameter of lm to restrict the subjects included in my lm. R is a special programming language and statistics package, both because of the wonderful features of R (thank you R developers), but equally importantly because of the community of people who willingly give of there time and knowledge to help other users. Many thanks. If any R developers are out there, may I suggest that the help page for lm include more information (perhaps an example) on how one uses the subset option. The current documentation states:  

subsetan optional vector specifying a subset of observations to be used in the fitting process.  

Although I read the help page, I could not get subset to work until the kind people mentioned above sent me examples.  

Again, many thanks to one and all!  

John      

John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC,
University of Maryland School of Medicine Claude D. Pepper OAIC, University of Maryland Clinical Nutrition Research Unit, and Baltimore VA Center Stroke of Excellence

University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524

(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) jsorkin@grecc.umaryland.edu

>>> Prof Brian Ripley <ripley@stats.ox.ac.uk> 1/18/2007 3:38 AM >>>
On Thu, 18 Jan 2007, David Barron wrote:

> Why not use the subset option? Something like:
>
> lm(diff ~ Age + Race, data=data, subset=data$Meno=="PRE")
>
> should do the trick, and be much easier to read!

And

    lm(diff ~ Age + Race, data = data, subset = (Meno=="PRE"))

would be easier still.

>
> On 18/01/07, John Sorkin <jsorkin@grecc.umaryland.edu> wrote:
>> I am having trouble selecting rows of a dataframe that will be
included
>> in a regression. I am trying to select those rows for which the
variable
>> Meno equals PRE. I have used the code below:
>>
>>
difffitPre<-lm(data[,"diff"]~data[,"Age"]+data[,"Race"],data=data[data[,"Meno"]=="PRE",])

You are missing a comma in data = data[<...>, ]

>> summary(difffitPre)
>>
>> The output from the summary indicates that more than 76 rows are
>> included in the regression:
>>
>> Residual standard error: 2.828 on 76 degrees of freedom
>>
>> where in fact only 22 rows should be included as can be seen from
the
>> following:
>>
>> print(data[length(data[,"Meno"]=="PRE","Meno"]))
>> [1] 22
>>
>> I would appreciate any help in modifying the data= parameter of the
lm
>> so that I include only those subjects for which Meno=PRE.
>>
>> R 2.3.1
>> Windows XP
>>
>> Thanks,
>> John
>>
>> John Sorkin M.D., Ph.D.
>> Chief, Biostatistics and Informatics
>> Baltimore VA Medical Center GRECC,
>> University of Maryland School of Medicine Claude D. Pepper OAIC,
>> University of Maryland Clinical Nutrition Research Unit, and
>> Baltimore VA Center Stroke of Excellence
>>
>> University of Maryland School of Medicine
>> Division of Gerontology
>> Baltimore VA Medical Center
>> 10 North Greene Street
>> GRECC (BT/18/GR)
>> Baltimore, MD 21201-1524
>>
>> (Phone) 410-605-7119
>> (Fax) 410-605-7913 (Please call phone number above prior to faxing)

>> jsorkin@grecc.umaryland.edu
>>
>> Confidentiality Statement:
>> This email message, including any attachments, is for the
so...{{dropped}}
>>
>> ______________________________________________
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R ( http://www.r/
)-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk 
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/ 
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Confidentiality Statement:
This email message, including any attachments, is for the so...{{dropped}}

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri Jan 19 09:34:14 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 18 Jan 2007 - 23:30:24 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.