# Re: [R] Data frame manipulation - newbie question

From: jim holtman <jholtman_at_gmail.com>
Date: Sun, 6 Jan 2008 20:41:18 -0500

There are a number of different ways that you would have to manipulate your data to do what you want. It is useful to learn some of these techniques. Here, I think, are the set of actions that you want to do.

> x <- read.table(textConnection("row k.idx step.forwd pt.num model prev value abs.error

```+ 1      200        0                  1             lm          09
10.5       1.5
+ 2      200        0                  2             lm          11
10.5       1.5
+ 3      201        1                  1             lm          10
12          2.0
+ 4      201        1                  2             lm          12
12          2.0
+ 5      202        2                  1             lm          12
12.1       0.1
+ 6      202        2                  2             lm          12
12.1       0.1
+ 7      200        0                  1             rlm         10.1
10.5       0.4
+ 8      200        0                  2             rlm         10.3
10.5       0.2
+ 9      201        1                  1             rlm         11.6
12          0.4
+ 10    201        1                  2             rlm         11.4
12          0.6
+ 11    202        2                  1             rlm         11.8
12.1       0.1
+ 12    202        2                  2             rlm         11.9
12.1       0.2"), header=TRUE)
```

> closeAllConnections()
>
> # split the data by the grouping factors
> x.split <- split(x, list(x\$k.idx, x\$step.forwd, x\$model), drop=TRUE)
> x.split

\$`200.0.lm`
row k.idx step.forwd pt.num model prev value abs.error
```1   1   200          0      1    lm    9  10.5       1.5
2   2   200          0      2    lm   11  10.5       1.5

```

\$`201.1.lm`
row k.idx step.forwd pt.num model prev value abs.error

```3   3   201          1      1    lm   10    12         2
4   4   201          1      2    lm   12    12         2

```

\$`202.2.lm`
row k.idx step.forwd pt.num model prev value abs.error

```5   5   202          2      1    lm   12  12.1       0.1
6   6   202          2      2    lm   12  12.1       0.1

```

\$`200.0.rlm`
row k.idx step.forwd pt.num model prev value abs.error

```7   7   200          0      1   rlm 10.1  10.5       0.4
8   8   200          0      2   rlm 10.3  10.5       0.2

```

\$`201.1.rlm`

row k.idx step.forwd pt.num model prev value abs.error

```9    9   201          1      1   rlm 11.6    12       0.4
10  10   201          1      2   rlm 11.4    12       0.6

```

\$`202.2.rlm`

row k.idx step.forwd pt.num model prev value abs.error

```11  11   202          2      1   rlm 11.8  12.1       0.1
12  12   202          2      2   rlm 11.9  12.1       0.2

```

>
> # now take the means of given columns
> x.mean <- lapply(x.split, function(.grp) colMeans(.grp[, c('prev', 'value', 'abs.error')]))
>
> # put back into a matrix
> (x.mean <- do.call(rbind, x.mean))

```           prev value abs.error
200.0.lm  10.00  10.5      1.50
201.1.lm  11.00  12.0      2.00
202.2.lm  12.00  12.1      0.10
200.0.rlm 10.20  10.5      0.30
201.1.rlm 11.50  12.0      0.50
202.2.rlm 11.85  12.1      0.15
```

>
> #boxplot
> boxplot(abs.error ~ k.idx, data=x)
>
> # create a table with average of the abs.error for each 'model'
> cbind(x, abs.error.mean=ave(x\$abs.error, x\$model))

row k.idx step.forwd pt.num model prev value abs.error abs.error.mean

```1    1   200          0      1    lm  9.0  10.5       1.5      1.2000000
2    2   200          0      2    lm 11.0  10.5       1.5      1.2000000
3    3   201          1      1    lm 10.0  12.0       2.0      1.2000000
4    4   201          1      2    lm 12.0  12.0       2.0      1.2000000
5    5   202          2      1    lm 12.0  12.1       0.1      1.2000000
6    6   202          2      2    lm 12.0  12.1       0.1      1.2000000
7    7   200          0      1   rlm 10.1  10.5       0.4      0.3166667
8    8   200          0      2   rlm 10.3  10.5       0.2      0.3166667
9    9   201          1      1   rlm 11.6  12.0       0.4      0.3166667
10  10   201          1      2   rlm 11.4  12.0       0.6      0.3166667
11  11   202          2      1   rlm 11.8  12.1       0.1      0.3166667
12  12   202          2      2   rlm 11.9  12.1       0.2      0.3166667
```

>

On Jan 6, 2008 10:50 AM, Rense Nieuwenhuis <rense.nieuwenhuis_at_gmail.com> wrote:
> Hi,
>
> you may want to use that apply / tapply function. Some find it a bit
> hard to grasp at first, but it will help you many times in many
> situations when you get the hang of it.
>
> Maybe you can get some information on my site: http://
> www.rensenieuwenhuis.nl/r-project/manual/basics/tables/
>
>
> Hope this helps,
>
> Rense Nieuwenhuis
>
>
>
> On Jan 3, 2008, at 11:53 , José Augusto M. de Andrade Junior wrote:
>
> > Hi all,
> >
> > Could someone please explain how can i efficientily query a data frame
> > with several factors, as shown below:
> >
> > ----------------------------------------------------------------------
> > -----------------------------------
> > Data frame: pt.knn
> > ----------------------------------------------------------------------
> > -----------------------------------
> > row | k.idx | step.forwd | pt.num | model | prev | value
> > | abs.error
> > 1 200 0 1 lm 09
> > 10.5 1.5
> > 2 200 0 2 lm 11
> > 10.5 1.5
> > 3 201 1 1 lm 10
> > 12 2.0
> > 4 201 1 2 lm 12
> > 12 2.0
> > 5 202 2 1 lm 12
> > 12.1 0.1
> > 6 202 2 2 lm 12
> > 12.1 0.1
> > 7 200 0 1 rlm 10.1
> > 10.5 0.4
> > 8 200 0 2 rlm 10.3
> > 10.5 0.2
> > 9 201 1 1 rlm 11.6
> > 12 0.4
> > 10 201 1 2 rlm 11.4
> > 12 0.6
> > 11 202 2 1 rlm 11.8
> > 12.1 0.1
> > 12 202 2 2 rlm 11.9
> > 12.1 0.2
> > ----------------------------------------------------------------------
> > ------------------------------------
> >
> > k.idx, step.forwd, pt.num and model columns are FACTORS.
> > prev, value, abs.error are numeric
> >
> > I need to take the mean value of the numeric columns (prev, value and
> > abs.error) for each k.idx and step.forwd and model. So: rows 1 and 2,
> > 3 and 4, 5 and 6,7 and 8, 9 and 10, 11 and 12 must be grouped
> > together.
> >
> > Next, i need to plot a boxplot of the mean(abs.error) of each model
> > for each k.idx.
> > I need to compare the abs.error of the two models for each step and
> > the mean overall abs.error of each model. And so on.
> >
> > I read the manuals, but the examples there are too simple. I know how
> > to do this manipulation in a "brute force" manner, but i wish to learn
> > how to work the right way with R.
> >
> > Could someone help me?
> > Thanks in advance.
> >
> > José Augusto
> > Undergraduate student
> > University of São Paulo
> > Business Administration Faculty
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

```--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
```
Received on Mon 07 Jan 2008 - 01:44:52 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 07 Jan 2008 - 02:30:05 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.