# Re: [R] newbie questions - looping through hierarchial datafille

From: Simon Blomberg <blomsp_at_ozemail.com.au>
Date: Thu 06 Oct 2005 - 12:09:37 EST

Well I haven't seen any replies to this, so I have had a stab at the problem of getting the data into a data frame.

The approach I took was to break up the data into a list, and then fill in a matrix, row by row, "filling down" a la spreadsheet style when necessary, taking advantage of the ordering of the data. Then coercing to a data.frame. Maybe not a very portable/general solution, but it appears to work.

list.to.data.frame <- function () {
filecon <- file(file.choose()) # open a data file dat <- strsplit(readLines(filecon, n=-1), split=" ") # read all the data into a list,

```                                         # 1 line per element, each element is
# a character vector of data
```
(variable length)
resultvec <- matrix(rep(NA, 16), nrow=1) # results will be stored here

filldown <- function (x) {
# cluge to simulate fill-down of a vector, spreadsheet style

```         if(all(is.na(x)) || all(!is.na(x))) x else {
last <- min(which(is.na(x)))
x[last:length(x)] <- x[last-1]
x
}
```

}

#loop through the data
for (vec in dat) {

```         f <- switch(vec, # what kind of field are we dealing with?

"A" = c(vec[-1], rep(NA, 15)),
"X" = c(NA, vec[-1], rep(NA, 12)),
"P" = c(rep(NA,4), vec[-1], rep(NA, 8)),
"T" = c(rep(NA, 8), vec[-1], rep(NA, 6)),
"L" = c(rep(NA, 10), vec[-1], rep(NA, 3)),

"F" = c(rep(NA, 13), vec[-1]))

if (any(is.na(resultvec[nrow(resultvec), which(!is.na(f))])))
# slot the data into the appropriate column
resultvec[nrow(resultvec),] <-
ifelse(is.na(resultvec[nrow(resultvec),]), f,
resultvec[nrow(resultvec),]) else
# if the row is full, start a new one
resultvec <- rbind(resultvec, f)
# if we are at the end of a row, fill down and start a new row
if (vec == "F") resultvec <- rbind(apply(resultvec, 2,
filldown), rep(NA, 16))
}

```

# coerce to a data frame, and get rid of the last empty row res <- as.data.frame(resultvec[-nrow(resultvec),], row.names=NULL) # set column names
names(res) <- c("Inventory", "Stratum_no", "Total", "Ye", "Plot_no", "age",

```"slope",
"species", "tree_no", "frequency", "leader",  "diameter", "height",
"start_height",
"finish_height", "feature")
```

#return the result
res
}

Cheers,

Simon.

At 10:36 AM 4/10/2005, you wrote:

```>Dear List,
>
>Im new to R - making a transition from SAS. I have a space delimited file
>with the following structure. Each line in the datafile is identified by
>the first letter.
>
>A = Inventory (Inventory)
>X = Stratum (Stratum_no Total Ye=year established)
>P = Plot (Plot_no age slope= species)
>T = Tree (tree_no frequency)
>F = Feature (start_height finish_height feature)
>
>On each of these lines there are some 'line specific' variables (in
>brackets). The data is hierarchical in nature - A feature belongs to a
>leader, a leader belongs to a tree, a tree belongs to a plot, a plot
>belongs to a stratum, a stratum belongs to inventory. There are many
>features in a tree. Many trees in a plot etc.
>
>In SAS I would read in the data in a procedural way using first. and last.
>variables to work out where inventories/stratums/plots/trees  finished and
>started so I could create summary statistics for each of them. For
>example, how many plots in a stratum? How many trees in a plot? An example
>of the sas code I would (not checked for errors!!!). If anybody could give
>me some idea on what the right approach in R would be for a similar
>analysis it would be greatly appreciated.
>
>regards Andrew
>
>
>Data datafile;
>infile 'test.txt';
>input @1 tag \$1. @@;
>retain inventory stratum plot tree leader;
>if tag = 'A' then input @3 inventory \$.;
>if tag = 'X' then input @3 stratum_no \$. total \$. yearest \$. ;
>if tag = 'P' then input @3 plot_no \$. age \$. slope \$. species \$;
>if tag = 'T' then input @3 tree_no \$. frequency  ;
>if tag = 'L' then input @3 leader_no \$ diameter  height  ;
>if tag = 'F' then input @3 start \$ finish \$ feature \$;
>if tag = 'F' then output;
>run;
>proc sort data = datafile;
>by inventory stratum_no  plot_no  tree_no  leader_no;
>
>* calculate mean dbh in each plot
>data dbh
>set datafile;
>by inventory stratum_no  plot_no  tree_no leader_no
>
>proc summary data = diameter;
>by inventory stratum plot tree;
>var diameter;
>output out = mean mean=;
>run;
>
>A BENALLA_1
>X 1 10 YE=1985
>T 1 25
>L 0 28.5 21.3528
>F 0 21.3528 SFNSW_DIC:P
>F 21.3528 100 SFNSW_DIC:P
>T 2 25
>L 0 32 23.1
>F 0 6.5 SFNSW_DIC:A
>F 6.5 23.1 SFNSW_DIC:C
>F 23.1 100 SFNSW_DIC:C
>T 3 25
>L 0 39.5 22.2407
>F 0 4.7 SFNSW_DIC:A
>F 4.7 6.7 SFNSW_DIC:C
>T 1 25
>L 0 38 22.1474
>F 0 1 SFNSW_DIC:G
>F 1 2.3 SFNSW_DIC:A
>T 1001 25
>L 0 38 22.1474
>F 0 1 SFNSW_DIC:G
>F 1 2.3 SFNSW_DIC:A
>T 2 25
>L 0 32.5 21.7386
>F 0 2 SFNSW_DIC:A
>F 2 3.3 SFNSW_DIC:G
>F 3.3 10.4 SFNSW_DIC:C
>X 2 10 YE=1985
>T 1 25
>L 0 28.5 21.3528
>F 0 21.3528 SFNSW_DIC:P
>F 21.3528 100 SFNSW_DIC:P
>T 2 25
>L 0 32 23.1
>F 0 6.5 SFNSW_DIC:A
>F 6.5 23.1 SFNSW_DIC:C
>F 23.1 100 SFNSW_DIC:C
>T 3 25
>L 0 39.5 22.2407
>F 0 4.7 SFNSW_DIC:A
>F 4.7 6.7 SFNSW_DIC:C
>T 1 25
>L 0 38 22.1474
>F 0 1 SFNSW_DIC:G
>F 1 2.3 SFNSW_DIC:A
>T 1001 25
>L 0 38 22.1474
>F 0 1 SFNSW_DIC:G
>F 1 2.3 SFNSW_DIC:A
>T 2 25
>L 0 32.5 21.7386
>F 0 2 SFNSW_DIC:A
>F 2 3.3 SFNSW_DIC:G
>F 3.3 10.4 SFNSW_DIC:C
>
>
>
>
>         [[alternative HTML version deleted]]
>
>______________________________________________
>R-help@stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help

```

Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat. Centre for Resource and Environmental Studies The Australian National University
Canberra ACT 0200
Australia
T: +61 2 6125 7800 email: Simon.Blomberg_at_anu.edu.au F: +61 2 6125 0757
CRICOS Provider # 00120C

R-help@stat.math.ethz.ch mailing list