Re: [R] Reformatting a table

From: Marc Schwartz <marc_schwartz_at_comcast.net>
Date: Thu, 20 Nov 2008 16:01:14 -0600

on 11/20/2008 02:28 PM Tul Gan wrote:
> Hi !
> ���������� I am new to R. Can somebody help me in reformatting�huge output files ,i.e, rearranging sets of columns in specific order.
> For example: I have data for three compunds 1, 2 and 3
> file1:
> ID CA1 CA3 CA2 MA2 MA1 MA3
> 1 14 15 13 7 12 3
> 2 19 7 12 10 14 5
> 3 21 12 19 6 8 9
> �
> to
> File 2:
> �
> ID CA1 CA2 CA3 MA1 MA2 MA3
> 1 14 13 15 12 7 3
> 2 19 12 7 14 10 5
> 3 21 19 12 8 6 9
> �
> or File3:
> ID CA1 MA1 CA2 MA2 CA3 MA3
> 1 14 12 13 7 15 3
> 2 19 14 12 10 7 5
> 3 21 8 19 6 12 9
> �
> Thanks for your help,
> Tul Gan

A general approach to the first case, where 'DF' is the initial data frame:

> DF[, c("ID", sort(names(DF)[-1]))]

  ID CA1 CA2 CA3 MA1 MA2 MA3
1 1 14 13 15 12 7 3
2 2 19 12 7 14 10 5
3 3 21 19 12 8 6 9

Essentially, you are sorting the column names less "ID", then reordering the columns of 'DF' using indexing.

> names(DF)[-1]
[1] "CA1" "CA3" "CA2" "MA2" "MA1" "MA3"

> sort(names(DF)[-1])
[1] "CA1" "CA2" "CA3" "MA1" "MA2" "MA3"

The second case is a little more complicated, since you are independently sorting on both alpha and numeric values, rather than just  alpha.

If there is a predictable sequence of names, you could do something like:

> paste(c("CA", "MA"), rep(1:3, each = 2), sep = "")
[1] "CA1" "MA1" "CA2" "MA2" "CA3" "MA3"

Thus:

> DF[, c("ID", paste(c("CA", "MA"), rep(1:3, each = 2), sep = ""))]
  ID CA1 MA1 CA2 MA2 CA3 MA3
1 1 14 12 13 7 15 3
2 2 19 14 12 10 7 5
3 3 21 8 19 6 12 9

If the sequence is not predictable, such that you would generate non-existing column names with the above, then we need to split the existing column names and sort them separately. Something along the lines of the following:

# use gsub() to split out the alpha and numeric values. Note that we # need to create a data frame to handle an alpha and a numeric column

names.df <- data.frame(Alpha = gsub("[[:digit:]]", "",

                                    names(DF)[-1]),
                       Digits = as.numeric(gsub("[[:alpha:]]", "",
                                                names(DF)[-1])))


> names.df

  Alpha Digits

1    CA      1
2    CA      3
3    CA      2
4    MA      2
5    MA      1
6    MA      3


Now, get the indices for the sorted rows, based first upon the digits value, subsorted by the Alpha value:

> order(names.df$Digits, names.df$Alpha)
[1] 1 5 3 4 2 6

Now use those values to index the columns in 'DF'. Remember that we need to add 1 to each index, to account for the "ID" column, which is not included in 'names.mat':

> c(1, order(names.df$Digits, names.df$Alpha) + 1)
[1] 1 2 6 4 5 3 7

Thus:

> DF[, c(1, order(names.df$Digits, names.df$Alpha) + 1)]
  ID CA1 MA1 CA2 MA2 CA3 MA3
1 1 14 12 13 7 15 3
2 2 19 14 12 10 7 5
3 3 21 8 19 6 12 9

Note, that we could also solve the first case, by reversing the sorting levels from names.df:

> DF[, c(1, order(names.df$Alpha, names.df$Digits) + 1)]
  ID CA1 CA2 CA3 MA1 MA2 MA3
1 1 14 13 15 12 7 3
2 2 19 12 7 14 10 5
3 3 21 19 12 8 6 9

See ?gsub, ?sort and ?order

HTH, Marc Schwartz



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 20 Nov 2008 - 22:04:41 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 20 Nov 2008 - 22:30:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive