[R] help with rowsum/aggregate type functions

From: Charles Murtaugh <murtaugh_at_genetics.utah.edu>
Date: Mon, 24 Mar 2008 21:22:27 -0600


  This is a question with a trivial and obvious answer, I'm sure, but I can't seem to find it in the help files and books that I have handy. I have a dataframe consisting of two columns, "Gene_Name," a list of gene symbols, and "Number," a numeric measure of how frequently a tag representing that gene showed up in a SAGE library. Several of the genes are represented by multiple tags, and therefore are present more than once in the list, e.g.:

1167     Zcchc8      6
1168     Zcwpw1      5
1169     Zdhhc18     6
1170     Zdhhc20     5
1171     Zdhhc3      6
1172     Zdhhc3      5
1173     Zeb2        9
1174     Zeb2        6

  What I want is to collapse the list by gene name, such that duplicates are summed up and appear only once in the final version:

Zcchc8 6

Zcwpw1 5

Zdhhc18     6
Zdhhc20     5

Zdhhc3     11

Zeb2       15

  The only way I can figure out to do this is via rowsum:

> rowsum (Number,Gene_Name)

gives me exactly what I want, *except* that in the end, I am left with a matrix containing the Number values and with the Gene_Names used as row names (the output therefore looks exactly as printed above) -- what I want is a dataframe equivalent to the starting table, with numbered rows and separate, accessible columns containing the Gene_Name and Number values.

  I was able to put such a dataframe together manually, by cobbling together the row names of the above list with the values:

> genes.unique <- data.frame (rownames (rowsum(Number,Gene_Name)), rowsum(Number,Gene_Name))

but then I have to manually replace the row names of the dataframe with numbers, to get back to what I wanted in the first place.

  I hope this makes some sort of sense. Is there an easier way to do this? Thanks in advance!

  Charlie Murtaugh

L. Charles Murtaugh
Assistant Professor

University of Utah
Dept. of Human Genetics
15 N. 2030 E. Rm. 2100
Salt Lake City, UT 84112

tel 801-581-5958
fax 801-581-6463
email murtaugh_at_genetics.utah.edu

        [[alternative HTML version deleted]]

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 25 Mar 2008 - 04:12:56 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 25 Mar 2008 - 14:30:23 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive