Re: [R] parts of data frames: subset vs. [-c()]

From: Stefan Th. Gries <stgries_lists_at_arcor.de>
Date: Sat 27 Aug 2005 - 03:42:13 EST

>> From: "Stefan Th. Gries" <stgries_lists@arcor.de> writes: I have a problem with splitting up a data frame called ReVerb: I would like to extract all cases where SYNTAX=="Ditrans" from ReVerb, store that in a file, and then generate ReVerb again without these cases and factor levels. My problem is probably obvious from the following lines of code:

> ditrans<-which(SYNTAX=="Ditrans")
> ReVerb1<-ReVerb[-c(ditrans),]; dim(ReVerb1)
[1] 91532 16
# ok, so the 92713-91532=1181 cases where SYNTAX=="Ditrans" have been removed, but ...
> ReVerb1<-subset(ReVerb, SYNTAX!="Ditrans"); dim(ReVerb1)
[1] 91528 16
# ... so why don't I get 91532 again as the number of rows? # Any ideas??

> From: Peter Dalgaard <p.dalgaard@biostat.ku.dk>
> The SYNTAX variable is not necessarily the same. Could you retry the first case with
> ditrans <- which(ReVerb$SYNTAX=="Ditrans")
> ?

The results were the same as with 'ditrans<-which(SYNTAX=="Ditrans")'.

> Otherwise, try doing a setdiff() on the rownames of the two discrepant results and see which are the four cases that differ.

This solved the issue: Using setdiff, I found that the cases that the second way with subset fails to include are NA's ... - I was not aware of how subset treats NA, sorry.

Thanks a lot,
STG

--
Stefan Th. Gries
----------------------------------------
Max Planck Inst. for Evol. Anthropology
http://people.freenet.de/Stefan_Th_Gries
----------------------------------------

Machen Sie aus 14 Cent spielend bis zu 100 Euro!
Die neue Gaming-Area von Arcor - über 50 Onlinespiele im Angebot.
http://www.arcor.de/rd/emf-gaming-1

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Sat Aug 27 19:14:04 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 16:00:45 EST