Re: [R] subsetting data-frame by vector of characters

From: Peter Dalgaard <P.Dalgaard_at_biostat.ku.dk>
Date: Fri, 13 Jun 2008 16:51:21 +0200

james perkins wrote:
> Thanks a lot for that. Its the %in% I needed to work out mainly
>
> large didn't mean anything in particular, just that it gets quite long
> with the real data.
> I did mean: names = c("John", "Phil", "Robert")
>
> The only problem is that using the method you suggest is that I lose
> the indexing, ie in the example, instead of:
>
> (index) Name Fave.Number
> 1 John 7
> 2 Phil 14
> 3 Robert 23
>
>
> I end up with
>
>
> (index) Name Fave.Number
> 1 John 7
> 3 Phil 14
> 5 Robert 23
>
> This isnt a problem at the moment but I guess it could be if I used
> the table later in loops. Is there an easy way to re-index the table?
>
Notice that these are names, not numbers: result[2,1] is "Phil" in both cases. If it bothers you, just set rownames(result) <- NULL

(BTW, are your names unique? in that case you could set them as rownames and use them for indexing:

rownames(names.and.numbers) <- names.and.numbers$Name names.and.numbers[names, ]

> Kind regards
>
> Jim
>
> Wacek Kusnierczyk wrote:
>> james perkins wrote:
>>
>>> Hi,
>>>
>>> I have a very simple problem but I can't think how to solve it without
>>> using a for loop and creating a large logical vector. However given
>>> the nature of the problem I am sure there is a "1-liner" that could do
>>> the same thing much more efficiently.
>>>
>>> bascially I have a dataframe with characters in, eg
>>>
>>>
>>>> names.and.numbers
>>>>
>>> (index) Name Fave.Number
>>> 1 John 7
>>> 2 Tony 12
>>> 3 Phil 14
>>> 4 Adam 22
>>> 5 Robert 23
>>>
>>>
>>> Now, imagine I have a vector of names, ie:
>>>
>>>
>>>> names = c("John,Phil,Robert")
>>>>
>>
>> this is a one-element vector of string(s) that are concatenated names
>> (strings with names).
>> or you mean: names = c("John", "Phil", "Robert")
>>
>>
>>
>>> All I want to do is get the subset of the dataframe which corresponds
>>> to the names in the vector "Names". IE
>>>
>>> (index) Name Fave.Number
>>> 1 John 7
>>> 2 Phil 14
>>> 3 Robert 23
>>>
>>
>> this should do:
>> names.and.numbers[names.and.numbers$Name %in% names,]
>>
>> if names is as you say above, do
>> names.and.numbers[names.and.numbers$Name %in% strsplit(names,","), ]
>>
>> you do create a logical vector here (what does 'large' mean?), but no
>> loop is involved at the surface.
>>
>> vQ
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
   O__  ---- Peter Dalgaard             Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk)              FAX: (+45) 35327907

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 13 Jun 2008 - 17:59:26 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 13 Jun 2008 - 18:30:45 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive