Re: [R] extracting data into different subsets

From: jim holtman <jholtman_at_gmail.com>
Date: Sun 14 Jan 2007 - 21:32:07 GMT

try this:

> input <- " y Slide Block ID

+  441068 -0.020464103 1 15 AAAAGFASKTPANQA
+  448844 0.061400545 1 41 AAAAGFASKTPANQA
+  456620 -0.031026896 10 15 AAAAGFASKTPANQA
+  464396 -0.033166864 10 41 AAAAGFASKTPANQA
+  472172 -0.108148804 11 15 AAAAGFASKTPANQA
+  479948 -0.397759508 11 41 AAAAGFASKTPANQA
+  4167 -0.67283526 1 13 AAAAALPAFSPPAQA
+  11943 -0.23982701 1 37 AAAAALPAFSPPAQA
+  19719 -0.10169540 10 13 AAAAALPAFSPPAQA
+  27495 0.70043972 10 37 AAAAALPAFSPPAQA
+  35271 -0.18807235 11 13 AAAAALPAFSPPAQA
+  43047 -0.17982104 11 37 AAAAALPAFSPPAQA
+  5264 -0.011681805 1 17 AAAAATQAAGAGAVA
+  13040 -0.073063462 1 41 AAAAATQAAGAGAVA
+  20816 -0.017996429 10 17 AAAAATQAAGAGAVA
+  28592 0.010159866 10 41 AAAAATQAAGAGAVA
+  36368 -0.056034035 11 17 AAAAATQAAGAGAVA
+  44144 -0.346175641 11 41 AAAAATQAAGAGAVA
+  5612 -0.7121977 1 18 AAAAGFASKTPANQA
+  13388 -0.4076580 1 42 AAAAGFASKTPANQA
+  21164 -0.1864131 10 18 AAAAGFASKTPANQA
+  28940 -0.1140163 10 42 AAAAGFASKTPANQA
+  36716 -0.3246222 11 18 AAAAGFASKTPANQA
+  44492 -0.4355016 11 42 AAAAGFASKTPANQA
+ "

> x <- read.table(textConnection(input), header=TRUE)
>
> # find the breaks in the ID
> ID.breaks <- c(TRUE, diff(as.numeric(x$ID)) != 0)
> group.1 <- x[ID.breaks,]
> group.2 <- x[!ID.breaks,]
>
> group.1
                 y Slide Block              ID
441068 -0.02046410     1    15 AAAAGFASKTPANQA
4167   -0.67283526     1    13 AAAAALPAFSPPAQA
5264   -0.01168180     1    17 AAAAATQAAGAGAVA
5612   -0.71219770     1    18 AAAAGFASKTPANQA

> group.2
y Slide Block ID 448844 0.06140055 1 41 AAAAGFASKTPANQA 456620 -0.03102690 10 15 AAAAGFASKTPANQA 464396 -0.03316686 10 41 AAAAGFASKTPANQA 472172 -0.10814880 11 15 AAAAGFASKTPANQA 479948 -0.39775951 11 41 AAAAGFASKTPANQA 11943 -0.23982701 1 37 AAAAALPAFSPPAQA 19719 -0.10169540 10 13 AAAAALPAFSPPAQA 27495 0.70043972 10 37 AAAAALPAFSPPAQA 35271 -0.18807235 11 13 AAAAALPAFSPPAQA 43047 -0.17982104 11 37 AAAAALPAFSPPAQA 13040 -0.07306346 1 41 AAAAATQAAGAGAVA 20816 -0.01799643 10 17 AAAAATQAAGAGAVA 28592 0.01015987 10 41 AAAAATQAAGAGAVA 36368 -0.05603404 11 17 AAAAATQAAGAGAVA 44144 -0.34617564 11 41 AAAAATQAAGAGAVA 13388 -0.40765800 1 42 AAAAGFASKTPANQA 21164 -0.18641310 10 18 AAAAGFASKTPANQA 28940 -0.11401630 10 42 AAAAGFASKTPANQA
36716 -0.32462220 11 18 AAAAGFASKTPANQA 44492 -0.43550160 11 42 AAAAGFASKTPANQA
>

On 1/14/07, Jenny persson <jenny197806@yahoo.se> wrote:
>
> y Slide Block ID
> 441068 -0.020464103 1 15 AAAAGFASKTPANQA
> 448844 0.061400545 1 41 AAAAGFASKTPANQA
> 456620 -0.031026896 10 15 AAAAGFASKTPANQA
> 464396 -0.033166864 10 41 AAAAGFASKTPANQA
> 472172 -0.108148804 11 15 AAAAGFASKTPANQA
> 479948 -0.397759508 11 41 AAAAGFASKTPANQA
> 4167 -0.67283526 1 13 AAAAALPAFSPPAQA
> 11943 -0.23982701 1 37 AAAAALPAFSPPAQA
> 19719 -0.10169540 10 13 AAAAALPAFSPPAQA
> 27495 0.70043972 10 37 AAAAALPAFSPPAQA
> 35271 -0.18807235 11 13 AAAAALPAFSPPAQA
> 43047 -0.17982104 11 37 AAAAALPAFSPPAQA
> 5264 -0.011681805 1 17 AAAAATQAAGAGAVA
> 13040 -0.073063462 1 41 AAAAATQAAGAGAVA
> 20816 -0.017996429 10 17 AAAAATQAAGAGAVA
> 28592 0.010159866 10 41 AAAAATQAAGAGAVA
> 36368 -0.056034035 11 17 AAAAATQAAGAGAVA
> 44144 -0.346175641 11 41 AAAAATQAAGAGAVA
> 5612 -0.7121977 1 18 AAAAGFASKTPANQA
> 13388 -0.4076580 1 42 AAAAGFASKTPANQA
> 21164 -0.1864131 10 18 AAAAGFASKTPANQA
> 28940 -0.1140163 10 42 AAAAGFASKTPANQA
> 36716 -0.3246222 11 18 AAAAGFASKTPANQA
> 44492 -0.4355016 11 42 AAAAGFASKTPANQA
>
> where there are 4 different IDs and each ID appears twice in two blocks
> for each of 3 slides. I want to extract the data
> in such a way that every ID that appears the first time will be grouped to
> group 1, and the second time to group 2.
> For the data above, it means that the IDs with response y that are in
> blocks 15,13,17,18 for each slide will be in group 1 and the rest are in
> group 2. How can I do this in R ?
> Thanks for your help,
> Jenny
>
>
> ---------------------------------
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

	[[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon Jan 15 08:38:14 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sun 14 Jan 2007 - 22:30:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.