# Re: [R] Getting subsets of a data frame

From: Fernando Saldanha <fsaldan1_at_gmail.com>
Date: Sun 17 Apr 2005 - 03:49:45 EST

I am reading as fast as I can! Just started with R five days ago.

I found the following in the documentation:

"Although the default for 'drop' is 'TRUE', the default behaviour when only one _row_ is left is equivalent to specifying 'drop = FALSE'. To drop from a data frame to a list, 'drop = FALSE' has to (sic) specified explicitly."

I think the exception mentioned in the first sentence is the reason for my confusion.

I also think the second sentence is wrong and should have 'TRUE' instead of 'FALSE'.

While it is true that a data frame is a list, it is not a list of numbers, but rather a list of columns, which, if I understand correctly, can be either vectors or matrices. So regardless of the value assigned to 'drop' the returned object is a list.

When I asked "why isn't sw[1, ] a list?" I should have asked instead "why isn't sw[1, ] a list of vectors?"

I did some experiments with a data frame a, where the columns are vectors (no matrix columns):

> is.data.frame(a) # just checking
[1] TRUE
> a1<- a[3, ]
> (is.data.frame(a1))

```[1] TRUE                     (did not sop being a data frame)

> (is.list(a1))

[1] TRUE                     (but it is a list)

```

> a2<- a[3, , drop=T]
> (is.data.frame(a2))

```[1] FALSE                   (no longer a data frame)

> (is.list(a2))

[1] TRUE                     (but it is a list)

```

> a3<- a[3, , drop=F]
> (is.data.frame(a3))

```[1] TRUE                    (still a data frame)

> (is.list(a3))

[1] TRUE                    (but it is a list)

```

I also tried:

> a2[1]

\$dates.num
[1] 477032400

> a3[1]

dates.num
3 477032400 (notice the row name)

> attributes(a3[1])

\$names
[1] "dates.num"

\$class
[1] "data.frame"

\$row.names
[1] "3"

> attributes(a2[1])

\$names
[1] "dates.num"

FS

On 4/16/05, Prof Brian Ripley <ripley@stats.ox.ac.uk> wrote:
> On Sat, 16 Apr 2005, Prof Brian Ripley wrote:
>
> > Perhaps Fernando will also note that is documented in ?"[.data.frame",
> > a slightly more appropriate reference than Bill's.
> >
> > It would be a good idea to read a good account of R's indexing: Bill Venables
> > and I know of a couple you will find in the R FAQ.
>
> BTW,
>
> sw <- swiss
> sw[1,,drop=TRUE] *is* a list (not as claimed, but as documented)
> sw[1, ] is a data frame
> sw[, 1] is a numeric vector.
>
> I should have pointed out that "[.data.frame" is in the See Also of Bill's
> reference.
>
> BTW to Andy: a list is a vector, and Kurt and I recently have been trying
> to correct documentation that means `atomic vector' when it says `vector'.
> (Long ago lists in R were pairlists and not vectors.)
>
> > is.vector(list(a=1))
> [1] TRUE
>
>
> > On Sat, 16 Apr 2005, Liaw, Andy wrote:
> >
> >> Because a data frame can hold different data types (even matrices) in
> >> different variables, one row of it can not be converted to a vector in
> >> general (where all elements need to be of the same type).
> >>
> >> Andy
> >>
> >>> From: Fernando Saldanha
> >>>
> >>> Thanks, it's interesting reading.
> >>>
> >>> I also noticed that
> >>>
> >>> sw[, 1, drop = TRUE] is a vector (coerces to the lowest dimension)
> >>>
> >>> but
> >>>
> >>> sw[1, , drop = TRUE] is a one-row data frame (does not convert it into
> >>> a list or vector)
> >>>
> >>> FS
> >>>
> >>>
> >>> On 4/16/05, Bill.Venables@csiro.au <Bill.Venables@csiro.au> wrote:
> >>>> You should look at
> >>>>
> >>>>> ?"["
> >>>>
> >>>> and look very carefully at the "drop" argument. For your example
> >>>>
> >>>>> sw[, 1]
> >>>>
> >>>> is the first component of the data frame, but
> >>>>
> >>>>> sw[, 1, drop = FALSE]

> >>>>
> >>>> is a data frame consisting of just the first component, as
> >>>> mathematically fastidious people would expect.
> >>>>
> >>>> This is a convention, and like most arbitrary conventions
> >>> it can be very
> >>>> useful most of the time, but some of the time it can be a very nasty
> >>>> trap. Caveat emptor.
> >>>>
> >>>> Bill Venables.
> >>>>
> >>>> -----Original Message-----
> >>>> From: r-help-bounces@stat.math.ethz.ch
> >>>> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of
> >>> Fernando Saldanha
> >>>> Sent: Saturday, 16 April 2005 1:07 PM
> >>>> To: Submissions to R help
> >>>> Subject: [R] Getting subsets of a data frame
> >>>>
> >>>>
> >>>> There is a list of examples of expressions using [ and [[, with the
> >>>> outcomes. I was puzzled by the fact that, if sw is a data
> >>> frame, then
> >>>>
> >>>> sw[, 1:3]
> >>>>
> >>>> is also a data frame,
> >>>>
> >>>> but
> >>>>
> >>>> sw[, 1]
> >>>>
> >>>> is just a vector.
> >>>>
> >>>> Since R has no scalars, it must be the case that 1 and 1:1
> >>> are the same:
> >>>>
> >>>>> 1 == 1:1
> >>>> [1] TRUE
> >>>>
> >>>> Then why isn't sw[,1] = sw[, 1:1] a data frame?
> >>>>
> >>>> FS
> >>>>
> >>>> ______________________________________________
> >>>> R-help@stat.math.ethz.ch mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> http://www.R-project.org/posting-guide.html
> >>>>
> >>>
> >>> ______________________________________________
> >>> R-help@stat.math.ethz.ch mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> http://www.R-project.org/posting-guide.html
> >>>
> >>>
> >>>
> >>
> >> ______________________________________________
> >> R-help@stat.math.ethz.ch mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> http://www.R-project.org/posting-guide.html
> >>
> >
> > --
> > Brian D. Ripley, ripley@stats.ox.ac.uk
> > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> > University of Oxford, Tel: +44 1865 272861 (self)
> > 1 South Parks Road, +44 1865 272866 (PA)
> > Oxford OX1 3TG, UK Fax: +44 1865 272595
> >
>
> --
> Brian D. Ripley, ripley@stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>

R-help@stat.math.ethz.ch mailing list