Re: [R] Help with parsing a data file

From: Henrique Dallazuanna <wwwhsd_at_gmail.com>
Date: Thu, 06 Mar 2008 16:49:24 -0300

Try this:

lines <- readLines('yourfile')
newLines <- lines[-(1:(13+3))]
coln <- scan(textConnection(lines[3]), what="") lapply(which(nchar(newLines) == 4),
function(x)read.table(textConnection(newLines[seq(x + 1, x + 13)]), col.names=coln))

On 06/03/2008, sean <smachin1000_at_gmail.com> wrote:
> Hi All,
>
> I need to parse data from a file, example shown below. The first two lines
> can be skipped, the third line contains the column names. The next 13 lines
> can be skipped. The next line "1991" is a year value, with the following 13
> values data for that year. The file then repeats this format with (year, 13
> lines of data for that year). I would ideally like to end up with an
> array/list/vector of the block of 13 values, indexed by year, each block
> using the column names given on the third line.
>
> If anyone has any good ideas on how to do this in R, pls. let me know.
>
> Thanks,
> Sean
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 725280 BUFFALO NIAGARA INTL A NY -5 N42 56 W078 44 215 988
> 1991-2005
> MO AVGLO FL SDGLO AVDIR FL SDDIR AVDIF FL SDDIF AVETR AETRN TOT OPQ
> H2O TAU MAX_T MIN_T AVG_T AVGDT RH HTDD CLDD AVWS
> 1 1336 K5 222 1534 K7 676 837 K5 72 3806 13256 8.4 8.1 0.83
> 0.09 -0.52 -7.40 -3.86 -3.36 75 691 0 5.5
> 2 2261 K5 400 2691 K7 1026 1129 K5 74 5330 14714 7.6 7.1 0.79
> 0.10 0.97 -6.67 -2.76 -1.85 73 599 0 5.1
> 3 3249 K5 413 3207 K7 852 1578 K5 118 7428 16443 7.2 6.7 0.98
> 0.12 4.96 -3.21 0.93 2.04 71 541 0 4.9
> 4 4460 K5 570 4051 K6 1045 1951 K5 130 9509 18140 6.6 6.1 1.33
> 0.13 12.18 2.68 7.39 8.54 67 328 1 4.7
> 5 5484 K5 518 4529 K6 801 2408 K5 142 10999 19523 6.1 5.4 1.87
> 0.15 18.77 8.68 13.83 15.07 69 154 12 4.5
> 6 6046 K5 383 5011 K6 671 2567 K5 166 11616 20177 5.7 4.9 2.64
> 0.16 24.05 14.52 19.47 20.66 71 34 63 4.1
> 7 5793 K5 529 4734 K6 884 2537 K5 127 11250 19734 5.6 4.9 2.97
> 0.16 26.10 16.92 21.70 22.90 71 5 104 4.1
> 8 5057 K5 417 4390 K6 693 2245 K5 94 9974 18430 5.6 4.9 2.92
> 0.15 25.61 16.37 21.10 22.56 73 9 91 3.6
> 9 4001 K5 458 3864 K6 826 1797 K5 105 8078 16803 5.6 5.0 2.36
> 0.13 21.73 12.20 17.06 18.68 73 71 30 3.9
> 10 2502 K5 254 2564 K7 584 1306 K5 88 5948 15098 6.3 5.8 1.67
> 0.11 14.89 6.40 10.71 12.16 72 241 3 4.3
> 11 1395 K5 198 1394 K7 492 887 K5 47 4170 13545 7.9 7.5 1.30
> 0.10 8.37 1.42 4.91 5.82 73 403 0 5.0
> 12 1120 K5 173 1391 K7 475 701 K5 52 3351 12733 7.9 7.7 0.94
> 0.09 2.41 -3.92 -0.70 -0.03 75 592 0 5.0
> 13 3559 K5 201 3280 K7 383 1662 K5 50 7622 16550 6.7 6.2 1.72
> 0.12 13.29 4.83 9.15 10.27 72 3668 304 4.5
> 1991
> 1 1313 I5 637 1374 I6 1636 832 I5 169 3800 13249 8.2 7.8 0.75
> 0.07 -0.09 -6.67 -3.46 -2.94 73 673 0 5.9
> 2 1875 I5 887 1767 I6 2080 1137 I5 263 5310 14694 8.3 7.6 0.85
> 0.08 2.44 -3.84 -0.61 0.15 73 533 0 5.9
> 3 3205 I5 1520 3371 I6 3133 1458 I5 392 7395 16417 6.7 6.1 1.12
> 0.10 7.23 -1.17 2.75 3.75 70 474 0 5.3
> 4 3999 I5 1911 3451 I6 3501 1918 I5 521 9482 18116 6.9 5.9 1.60
> 0.12 14.46 5.65 9.91 11.04 68 250 2 5.4
> 5 5968 I5 1854 5369 I6 2936 2296 I5 437 10983 19506 6.1 4.5 2.46
> 0.14 23.15 12.56 17.85 19.12 68 81 66 4.8
> 6 6988 I5 1577 6761 I6 2983 2288 I5 604 11614 20176 4.8 3.0 2.42
> 0.15 26.09 14.95 20.80 22.28 64 14 80 4.3
> 7 6364 I5 1538 5779 I6 2799 2404 I5 568 11262 19749 5.0 3.7 2.89
> 0.16 27.17 16.96 22.43 23.77 66 1 116 4.4
> 8 5407 I5 1478 5114 I6 2693 2106 I5 527 9999 18451 4.8 4.0 2.91
> 0.18 26.64 16.49 21.44 23.08 73 2 102 4.2
> 9 4482 I5 1010 4126 I6 1953 2033 I5 415 8109 16830 5.8 4.6 2.24
> 0.19 22.05 10.98 16.67 18.53 66 97 42 4.3
> 10 2534 I5 864 2419 I6 1859 1396 I5 289 5978 15123 6.1 5.3 1.83
> 0.20 16.44 6.92 11.72 13.21 72 213 7 4.4
> 11 1264 I5 716 1059 I6 1733 851 I5 206 4190 13565 8.3 8.0 1.33
> 0.21 7.63 0.44 3.94 4.82 77 429 0 5.1
> 12 976 I5 423 826 I6 1172 714 I5 156 3354 12738 7.6 7.2 0.98
> 0.22 3.40 -4.20 -0.34 0.21 78 581 0 5.6
> 13 3698 I5 2146 3451 I6 2002 1619 I5 629 7623 16551 6.6 5.6 1.78
> 0.15 14.72 5.76 10.26 11.42 71 3347 415 5.0
> 1992
> 1 1149 I5 496 701 I6 919 896 I5 231 3791 13236 8.5 8.1 0.84
> 0.24 0.68 -6.20 -2.60 -1.69 79 654 0 5.4
> 2 1580 I5 708 898 I6 1469 1198 I5 255 5328 14708 8.2 7.7 0.86
> 0.27 1.26 -6.19 -2.40 -1.56 78 603 0 4.7
> 3 2968 I5 1429 2145 I6 2037 1760 I5 452 7449 16457 7.3 6.7 0.97
> 0.29 3.82 -4.35 -0.11 1.01 70 577 0 4.8
> 4 4050 I5 1812 2937 I6 2634 2146 I5 404 9527 18154 7.3 6.4 1.41
> 0.29 10.64 2.33 6.40 7.50 71 356 1 4.1
> 5 5654 I5 1935 4311 I6 2843 2695 I5 557 11009 19528 5.4 4.3 1.74
> 0.29 19.79 8.17 14.13 15.66 66 148 13 3.8
> 6 6170 I5 2120 4608 I6 3029 2877 I5 695 11617 20176 5.3 4.1 2.17
> 0.28 22.76 11.91 17.63 19.06 65 56 26 4.0
> 7 4879 I5 1816 2795 I6 1915 2835 I5 595 11242 19729 7.2 6.3 2.99
> 0.27 23.05 15.33 19.23 20.19 75 18 44 4.4
> 8 5168 I5 1720 4473 I6 2922 2256 I5 444 9959 18415 5.8 4.9 2.64
> 0.24 23.28 14.62 19.05 20.37 72 26 46 4.4
> 9 4094 I5 1361 3741 I6 2382 1893 I5 375 8058 16789 5.8 4.5 2.50
> 0.21 21.25 11.59 16.56 18.13 72 84 27 4.5
> 10 2499 I5 1177 2228 I6 1904 1393 I5 311 5928 15081 6.1 5.6 1.45
> 0.18 13.37 4.21 8.81 10.50 70 296 0 4.7
> 11 1134 I5 680 731 I6 1287 849 I5 249 4156 13533 8.5 8.1 1.39
> 0.15 7.58 1.22 4.38 5.30 77 418 0 4.8
> 12 1048 I5 508 1136 I6 1428 687 I5 146 3348 12729 7.8 7.3 0.96
> 0.14 3.30 -3.43 -0.06 0.83 69 570 0 5.1
> 13 3366 I5 1883 2559 I6 1489 1790 I5 790 7618 16545 6.9 6.1 1.66
> 0.24 12.56 4.10 8.42 9.61 72 3806 157 4.6
> ...
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 06 Mar 2008 - 19:55:51 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 06 Mar 2008 - 20:30:19 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive