Re: [R] Parsing question, partly comma separated partly underscore separated string

From: Don McKenzie <dmck_at_u.washington.edu>
Date: Sun, 06 Mar 2011 20:39:01 -0800

On 6-Mar-11, at 7:13 PM, Eric Fail wrote:

> Dear R-list,
>
> I have a partly comma separated partly underscore separated string
> that I am trying to parse into R.
>
> Furthermore I have a bunch of them, and they are quite long. I have
> now spent most of my Sunday trying to figure this out and thought I
> would try the list to see if someone here would be able to get me
> started.
>
> My data structure looks like this,
>
> (in a example.txt file)
> Subject ID,ExperimentName,2010-04-23,32:34:23,Version 0.4, 640 by
> 960 pixels, On Device M, M, 3.2.4,ZZ_373_462_488_TRT_at_9z.svg,
> 592,820,3.35,ZZ_032_288_436_CON_at_9z.svg,
> 332,878,3.66,ZZ_384_204_433_TRT_at_9z.svg,
> 334,824,3.28,ZZ_365_575_683_TRT_at_9z.svg,
> 598,878,3.50,ZZ_005_480_239_CON_at_9z.svg,
> 630,856,8.03,ZZ_030_423_394_CON_at_9z.svg,
> 98,846,4.09,ZZ_033_596_398_CON_at_9z.svg,
> 636,902,3.28,ZZ_263_064_320_TRT_at_9z.svg,570,894,1.26,BLOCK_at_9z.svg,
> 322,842,32.96,ZZ_004_088_403_CON_at_9z.svg,
> 606,908,3.32,ZZ_703_546_434_CON_at_9z.svg,
> 624,934,2.58,ZZ_712_348_543_CON_at_9z.svg,

> 20,828,5.36,ZZ_005_48_239_CON_at_9z.svg,
> 580,830,4.36,ZZ_310_444_623_TRT_at_9z.svg,
> 586,806,0.08,ZZ_030_423_394_CON_at_9z.svg,
> 350,854,3.84,ZZ_340_382_539_TRT_at_9z.svg,570,894,1.26,BLOCK_at_9z.svg,
> 542,840,4.44,ZZ_345_230_662_TRT_at_9z.svg,
> 632,844,2.47,ZZ_006_335_309_CON_at_9z.svg,
> 96,930,3.63,ZZ_782_346_746_TRT_at_9z.svg,
> 306,850,2.58,ZZ_334_200_333_TRT_at_9z.svg,
> 304,842,3.34,ZZ_383_506_726_TRT_at_9z.svg,
> 622,884,3.84,ZZ_294_360_448_TRT_at_9z.svg,
> 90,858,3.56,ZZ_334_335_473_TRT_at_9z.svg,570,894,1.26,BLOCK_at_9z.svg,
> 320,852,4.04,
> (end of example.txt file)
>
> The above is approximate 5% of the length of a full file, and then
> I got about 100 of them. Please note that the strings end with a
> comma.
>
> I am trying to parse it into something like this
>
> ID ImgNam BLOCK RUN Tx Ty Treatment x y Y
> Subject ID 373 1 1 462 488 TRT 592 820 3.35
> Subject ID 32 1 2 288 436 CON 332 878 3.66
> Subject ID 384 1 3 204 433 TRT 334 824 3.28
> Subject ID 365 1 4 575 683 TRT 598 878 3.5
> Subject ID 5 1 5 480 239 CON 630 856 8.03
> Subject ID 30 1 6 423 394 CON 98 846 4.09
> Subject ID 33 1 7 596 398 CON 636 902 3.28
> Subject ID 263 1 8 64 320 TRT 570 894 1.26
> Subject ID 4 2 1 88 403 CON 606 908 3.32
> Subject ID 703 2 2 546 434 CON 624 934 2.58
> Subject ID 712 2 3 348 543 CON 20 828 5.36
> Subject ID 5 2 4 48 239 CON 580 830 4.36
> Subject ID 310 2 5 444 623 TRT 586 806 0.08
> Subject ID 30 2 6 423 394 CON 350 854 3.84
> Subject ID 340 2 7 382 539 TRT 570 894 1.26
> Subject ID 345 3 1 230 662 TRT 632 844 2.47
> Subject ID 6 3 2 335 309 CON 96 930 3.63
> Subject ID 782 3 3 346 746 TRT 306 850 2.58
> Subject ID 334 3 4 200 333 TRT 304 842 3.34
> Subject ID 383 3 5 506 726 TRT 622 884 3.84
> Subject ID 294 3 6 360 448 TRT 90 858 3.56
> Subject ID 334 3 7 335 473 TRT 570 894 1.26
>
> I could do it in Excel, but it would take me a week--and it would
> be stupid--if someone could please help me get started I would very
> much appreciate it. It would not only benefit me, but my colleagues
> would see the benefit of R and the R-list in particular.
>
> Thanks in advance!
>
> Eric
>

In a good text editor it would be one command per file. So if you are on UNIX or mac OSX you could loop through files with (probably) an awk
command. I don't remember the syntax (it's been too long) but it should be just a few lines of shell script. In windows I'm not sure but there should
be something similar.

Maybe that "gets you started". Probably one of the list jocks will have it nailed if you wait.
> --
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Why does the universe go to all the bother of existing? -- Stephen Hawking

#define QUESTION ((bb) || !(bb))
-- William Shakespeare

Don McKenzie, Research Ecologist
Pacific WIldland Fire Sciences Lab
US Forest Service

Affiliate Professor
School of Forest Resources, College of the Environment CSES Climate Impacts Group
University of Washington

desk: 206-732-7824
cell: 206-321-5966
dmck_at_uw.edu
donaldmckenzie_at_fs.fed.us



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 07 Mar 2011 - 04:43:16 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 07 Mar 2011 - 05:20:19 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive