Re: [R] using subset() in data frame

From: Chuck Cleland <>
Date: Sat, 23 Feb 2008 06:09:31 -0500

On 2/22/2008 8:01 PM, Robert Walters wrote:
> R folks,
> As an R novice, I struggle with the mystery of subsetting. Textbook and
> online examples of this seem quite straightforward yet I cannot get my
> mind around it. For practice, I'm using the code in MASS Ch. 6,
> "whiteside data" to analyze a different data set with similar variables
> and structure.
> Here is my data frame:
> ###subset one of three cases for the variable 'position'
> >data.b<-data.a[data.a$position=="inrow",]
> > print(data.b)
> position porosity x y
> 1 inrow macro 1.40 16.5
> 2 inrow macro . .
> . . . .
> . . . .
> 7 inrow micro
> 8 inrow micro
> Now I want to do separate lm's for each case of porosity, macro and
> micro. The code as given in MASS, p.141, slightly modified would be:
> fit1 <- lm(y ~ x, data=data.b, subset = porosity == "macro")
> fit2 <- update(fit1, subset = porosity == "micro")
> ###simplest code with subscripting
> fit1 <- lm(y ~ x, data.b[porosity=="macro"])

   Assuming data.b has two dimensions, you need a comma after porosity=="macro" to indicate that you are selecting a subset of rows of the data frame:

fit1 <- lm(y ~ x, data.b[porosity=="macro",])

> ###following example in ?subset
> fit1 <- lm(y ~ x, data.b, subset(data.b, porosity, select=macro))

   The select argument to subset is meant to select variables (i.e., it indicates "columns to select from a data frame") and you are misusing it by specifying the level of a factor. If you make your call to subset by itself (a good idea when you are learning how a function works), you should get an error like this:

 > subset(whiteside, Insul, select=Before) Error in, Insul, select = Before) :

   'subset' must evaluate to logical

  What I think you intended was this:

subset(data.b, porosity == "macro")

   Even with the correct call to subset, you also don't want both data.b and the subset piece, because subset returns a data frame. In other words, you would be passing lm() two different data frames. So try this instead:

fit1 <- lm(y ~ x, subset(data.b, porosity == "macro"))

> None of th above, plus many permutations thereof, works.
> Can anyone educate me?
> Thanks,
> Robert Walters
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Sat 23 Feb 2008 - 11:18:37 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 23 Feb 2008 - 12:30:15 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive