Re: [R] parts of data frames: subset vs. [-c()]

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Sat 27 Aug 2005 - 03:40:26 EST

Are there NAs in the variable?

SYNTAX=="Ditrans" and SYNTAX!="Ditrans" are not mutually exclusive.

On Fri, 26 Aug 2005, Stefan Th. Gries wrote:

> Dear all
>
> I have a problem with splitting up a data frame called ReVerb:
>
> str(ReVerb)
> `data.frame': 92713 obs. of 16 variables:
> $ CHILD : Factor w/ 7 levels "ABE","ADA","EVE",..: 1 1 1 1 1 1 1 1 1 1 ...
> $ AGE : Factor w/ 484 levels "1;06.00","1;06.16",..: 43 43 43 99 99 99 99 99 99 99 ...
> $ AGE_Q : num 2.0 2.0 2.0 2.4 2.4 ...
> $ INTERVALS: num 2 2 2 2.25 2.25 2.25 2.25 2.25 2.25 2.25 ...
> $ RND : int 34368 38311 14949 20586 72516 27186 88019 10767 114448 86146 ...
> $ SYNTAX : Factor w/ 17 levels "Acmp","Amats",..: 15 12 8 15 7 16 7 7 16 7 ...
> $ LEXICAL : Factor w/ 1643 levels "$ACHE","$ACT",..: 194 803 803 294 299 803 1562 299 679 1562 ...
> $ MORPH : Factor w/ 337 levels "$","$ =inf","$ =prs",..: 9 20 9 39 184 231 57 67 231 39 ...
> $ COMPLEM : Factor w/ 1989 levels "$","$ V PR=Lp [1.2]",..: 203 547 220 203 1101 368 1834 1667 368 1834 ...
> $ MATRIX : Factor w/ 906 levels "$ ???","$ be PR=Aen",..: 5 5 5 308 5 856 5 5 856 308 ...
> $ SITUATION: Factor w/ 9 levels "[imitation of Mom: you know what I said]",..: 2 2 2 2 2 2 2 2 2 2 ...
> $ V_ANN : int 1 1 1 4 4 4 4 3 3 3 ...
> $ QUEST : int 0 0 0 0 0 0 0 0 0 0 ...
> $ EXCL : int 0 0 0 1 1 1 1 0 0 0 ...
> $ U_LEN : int 3 4 5 13 13 13 13 8 8 8 ...
> $ UTTERANCE: Factor w/ 55113 levels "","# (be)cause he wanted to .",..: 5696 39091 52180 2262 2262 2262 2262 3593 3593 3593 ...
>
> The level causing the problem is SYNTAX:
>
> as.data.frame(sort(table(SYNTAX)))
> sort(table(SYNTAX))
> Particles 100
> PR=N1 144
> Amats 271
> Trans_PR=A2 787
> Ditrans 1181
> Intrans_PR=A1 1399
> Acmp 2402
> Trans_PR=V2 2433
> CPcmps 2769
> Vpreps 4896
> Intrans_V0 5182
> Trans_PR=L2 7653
> Trans_V02 8117
> Intrans_PR=L1 8457
> Intrans_V1 9643
> Intrans_PR=V1 14987
> Trans_V12 22288
>
>
> I would like to extract all cases where SYNTAX=="Ditrans" from ReVerb, store that in a file, and then generate ReVerb again without these cases and factor levels. My problem is probably obvious from the following lines of code:
>
> ditrans<-which(SYNTAX=="Ditrans")
> ReVerb1<-ReVerb[-c(ditrans),]; dim(ReVerb1)

> [1] 91532 16
>
> # ok, so the 92713-91532=1181 cases where SYNTAX=="Ditrans" have been removed, but ...
>
> ReVerb1<-subset(ReVerb, SYNTAX!="Ditrans"); dim(ReVerb1)

> [1] 91528 16
>
> # ... so why don't I get 91532 again as the number of rows?
>
> Any ideas??

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Received on Sat Aug 27 03:46:06 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 15:59:59 EST