Re: [R] newbie questions - looping through hierarchial datafille

From: jim holtman <jholtman_at_gmail.com>
Date: Tue 04 Oct 2005 - 20:54:25 EST

Here a brute force way based on the format of you input data. Basically it reads a line in and then 'splits' it apart based on blanks and then processes based on the 'tag'. Information is stored in some global data and the '.result' is converted into a dataframe that you can work with.

> xIN <- scan('/treedata.txt', what='', sep='\n') # read in entire line
> xIN <- strsplit(xIN, ' ') # split out fields separated by blanks
> # initialize 'global' variables to collect the information
> Out <- list() # individual results
> .result <- list(); r.n <- 0
> # process the data into a list '.result'
> # make use of the '<<-' to assign to a 'global' value
> invisible(lapply(xIN, function(x){
```+ if (x[1] == "A") Out\$inv <<- x[2]
+ else if (x[1] == "X") {
+ Out\$strat <<- x[2]
+ Out\$total <<- x[3]
+ Out\$year <<- x[4]
+ } else if (x[1] == "P"){
+ Out\$plot <<- x[2]
+ Out\$age <<- x[3]
+ Out\$slope <<- x[4]
+ Out\$species <<- x[5]
+ } else if (x[1] == "T"){
+ Out\$tree <<- x[2]
+ Out\$freq <<- x[3]
+ } else if (x[1] == "L"){
+ Out\$diam <<- x[3]
+ Out\$height <<- x[4]
+ } else if (x[1] == "F") {
+ Out\$start <<- x[2]
+ Out\$finish <<- x[3]
+ Out\$feature <<- x[4]
+ .result[[r.n <<- r.n + 1]] <<- Out # store the result
+ }
+ }))
```

> # convert the list to a dataframe for processing
> myData <- lapply(.result, function(x) do.call('cbind', x))
> myData <- as.data.frame(do.call('rbind', myData))
> myData[order(myData\$inv, myData\$strat, myData\$plot, myData\$tree,
inv strat total year plot age slope species tree freq leader diam height start finish feature
1 BENALLA_1 1 10 YE=1985 1 20.25 slope=14 SPP:P.RAD 1 25 0 28.5 21.3528 0 21.3528 SFNSW_DIC:P
2 BENALLA_1 1 10 YE=1985 1 20.25 slope=14 SPP:P.RAD 1 25 0 28.5 21.3528 21.3528 100 SFNSW_DIC:P
3 BENALLA_1 1 10 YE=1985 1 20.25 slope=14 SPP:P.RAD 2 25 0 32 23.1 0 6.5SFNSW_DIC:A
```4 BENALLA_1 1 10 YE=1985 1 20.25 slope=14 SPP:P.RAD 2 25 0 32 23.1 6.5
23.1SFNSW_DIC:C
5 BENALLA_1 1 10 YE=1985 1 20.25 slope=14 SPP:P.RAD 2 25 0 32 23.1 23.1 100
```
SFNSW_DIC:C
```6 BENALLA_1 1 10 YE=1985 1 20.25 slope=14 SPP:P.RAD 3 25 0 39.5 22.2407 0
4.7 SFNSW_DIC:A
7 BENALLA_1 1 10 YE=1985 1 20.25 slope=14 SPP:P.RAD 3 25 0 39.5 22.2407 4.7
6.7 SFNSW_DIC:C
8 BENALLA_1 1 10 YE=1985 2 20.25 slope=13 SPP:P.RAD 1 25 0 38 22.1474 0 1
```
SFNSW_DIC:G
9 BENALLA_1 1 10 YE=1985 2 20.25 slope=13 SPP:P.RAD 1 25 0 38 22.1474 1 2.3SFNSW_DIC:A
10 BENALLA_1 1 10 YE=1985 2 20.25 slope=13 SPP:P.RAD 1001 25 0 38 22.1474 0 1 SFNSW_DIC:G
11 BENALLA_1 1 10 YE=1985 2 20.25 slope=13 SPP:P.RAD 1001 25 0 38 22.1474 1 2.3 SFNSW_DIC:A
12 BENALLA_1 1 10 YE=1985 2 20.25 slope=13 SPP:P.RAD 2 25 0 32.5 21.7386 0 2 SFNSW_DIC:A
13 BENALLA_1 1 10 YE=1985 2 20.25 slope=13 SPP:P.RAD 2 25 0 32.5 21.7386 2 3.3 SFNSW_DIC:G
```14 BENALLA_1 1 10 YE=1985 2 20.25 slope=13 SPP:P.RAD 2 25 0 32.5 21.7386 3.3
10.4 SFNSW_DIC:C
15 BENALLA_1 2 10 YE=1985 1 20.25 slope=14 SPP:P.RAD 1 25 0 28.5 21.3528 0
```
21.3528 SFNSW_DIC:P
16 BENALLA_1 2 10 YE=1985 1 20.25 slope=14 SPP:P.RAD 1 25 0 28.5 21.3528 21.3528 100 SFNSW_DIC:P
17 BENALLA_1 2 10 YE=1985 1 20.25 slope=14 SPP:P.RAD 2 25 0 32 23.1 0 6.5SFNSW_DIC:A
18 BENALLA_1 2 10 YE=1985 1 20.25 slope=14 SPP:P.RAD 2 25 0 32 23.1 6.5 23.1SFNSW_DIC:C
19 BENALLA_1 2 10 YE=1985 1 20.25 slope=14 SPP:P.RAD 2 25 0 32 23.1 23.1 100 SFNSW_DIC:C
20 BENALLA_1 2 10 YE=1985 1 20.25 slope=14 SPP:P.RAD 3 25 0 39.5 22.2407 0 4.7 SFNSW_DIC:A
21 BENALLA_1 2 10 YE=1985 1 20.25 slope=14 SPP:P.RAD 3 25 0 39.5 22.2407 4.7 6.7 SFNSW_DIC:C
22 BENALLA_1 2 10 YE=1985 2 20.25 slope=13 SPP:P.RAD 1 25 0 38 22.1474 0 1 SFNSW_DIC:G
```23 BENALLA_1 2 10 YE=1985 2 20.25 slope=13 SPP:P.RAD 1 25 0 38 22.1474
1 2.3SFNSW_DIC:A
24 BENALLA_1 2 10 YE=1985 2 20.25 slope=13 SPP:P.RAD 1001 25 0 38 22.1474 0
```
1 SFNSW_DIC:G
25 BENALLA_1 2 10 YE=1985 2 20.25 slope=13 SPP:P.RAD 1001 25 0 38 22.1474 1 2.3 SFNSW_DIC:A
26 BENALLA_1 2 10 YE=1985 2 20.25 slope=13 SPP:P.RAD 2 25 0 32.5 21.7386 0 2 SFNSW_DIC:A
27 BENALLA_1 2 10 YE=1985 2 20.25 slope=13 SPP:P.RAD 2 25 0 32.5 21.7386 2 3.3 SFNSW_DIC:G
28 BENALLA_1 2 10 YE=1985 2 20.25 slope=13 SPP:P.RAD 2 25 0 32.5 21.7386 3.3 10.4 SFNSW_DIC:C
```>
>
>

```

On 10/3/05, Andrew.Haywood@poyry.com.au <Andrew.Haywood@poyry.com.au> wrote:

```>
```

> Dear List,
```>
```

> Im new to R - making a transition from SAS. I have a space delimited file
> with the following structure. Each line in the datafile is identified by
> the first letter.
```>

> A = Inventory (Inventory)

> X = Stratum (Stratum_no Total Ye=year established)
> P = Plot (Plot_no age slope= species)
> T = Tree (tree_no frequency)
```
> F = Feature (start_height finish_height feature)
```>
```

> On each of these lines there are some 'line specific' variables (in
> brackets). The data is hierarchical in nature - A feature belongs to a
> leader, a leader belongs to a tree, a tree belongs to a plot, a plot
> belongs to a stratum, a stratum belongs to inventory. There are many
> features in a tree. Many trees in a plot etc.
```>
```

> In SAS I would read in the data in a procedural way using first. and last.
> variables to work out where inventories/stratums/plots/trees finished and
> started so I could create summary statistics for each of them. For
> example, how many plots in a stratum? How many trees in a plot? An example
> of the sas code I would (not checked for errors!!!). If anybody could give
> me some idea on what the right approach in R would be for a similar
> analysis it would be greatly appreciated.
```>
```

> regards Andrew
```>
>
```

> Data datafile;
> infile 'test.txt';
> input @1 tag \$1. @@;
> retain inventory stratum plot tree leader;
> if tag = 'A' then input @3 inventory \$.;
> if tag = 'X' then input @3 stratum_no \$. total \$. yearest \$. ;
> if tag = 'P' then input @3 plot_no \$. age \$. slope \$. species \$;
> if tag = 'T' then input @3 tree_no \$. frequency ;
> if tag = 'L' then input @3 leader_no \$ diameter height ;
> if tag = 'F' then input @3 start \$ finish \$ feature \$;
> if tag = 'F' then output;
> run;
> proc sort data = datafile;
> by inventory stratum_no plot_no tree_no leader_no;
```>
```

> * calculate mean dbh in each plot
> data dbh
> set datafile;
> by inventory stratum_no plot_no tree_no leader_no
```>
```

> proc summary data = diameter;
> by inventory stratum plot tree;
> var diameter;
> output out = mean mean=;
> run;
```>
```

> A BENALLA_1
> X 1 10 YE=1985
> P 1 20.25 slope=14 SPP:P.RAD
> T 1 25
> L 0 28.5 21.3528
> F 0 21.3528 SFNSW_DIC:P
> F 21.3528 100 SFNSW_DIC:P
> T 2 25
> L 0 32 23.1
> F 0 6.5 SFNSW_DIC:A
> F 6.5 23.1 SFNSW_DIC:C
> F 23.1 100 SFNSW_DIC:C
> T 3 25
> L 0 39.5 22.2407
> F 0 4.7 SFNSW_DIC:A
> F 4.7 6.7 SFNSW_DIC:C
> P 2 20.25 slope=13 SPP:P.RAD
> T 1 25
> L 0 38 22.1474
> F 0 1 SFNSW_DIC:G
> F 1 2.3 SFNSW_DIC:A
> T 1001 25
> L 0 38 22.1474
> F 0 1 SFNSW_DIC:G
> F 1 2.3 SFNSW_DIC:A
> T 2 25
> L 0 32.5 21.7386
> F 0 2 SFNSW_DIC:A
> F 2 3.3 SFNSW_DIC:G
> F 3.3 10.4 SFNSW_DIC:C
> X 2 10 YE=1985
> P 1 20.25 slope=14 SPP:P.RAD
> T 1 25
> L 0 28.5 21.3528
> F 0 21.3528 SFNSW_DIC:P
> F 21.3528 100 SFNSW_DIC:P
> T 2 25
> L 0 32 23.1
> F 0 6.5 SFNSW_DIC:A
> F 6.5 23.1 SFNSW_DIC:C
> F 23.1 100 SFNSW_DIC:C
> T 3 25
> L 0 39.5 22.2407
> F 0 4.7 SFNSW_DIC:A
> F 4.7 6.7 SFNSW_DIC:C
> P 2 20.25 slope=13 SPP:P.RAD
> T 1 25
> L 0 38 22.1474
> F 0 1 SFNSW_DIC:G
> F 1 2.3 SFNSW_DIC:A
> T 1001 25
> L 0 38 22.1474
> F 0 1 SFNSW_DIC:G
> F 1 2.3 SFNSW_DIC:A
> T 2 25
> L 0 32.5 21.7386
> F 0 2 SFNSW_DIC:A
> F 2 3.3 SFNSW_DIC:G
> F 3.3 10.4 SFNSW_DIC:C
```>
>
>
>
```

> [[alternative HTML version deleted]]
```>

> ______________________________________________
```
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> http://www.R-project.org/posting-guide.html
>
```--
Jim Holtman
Cincinnati, OH
+1 513 247 0281

What the problem you are trying to solve?

[[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help