From: Charles Murtaugh <murtaugh_at_genetics.utah.edu>
Date: Mon, 24 Mar 2008 21:22:27 -0600


  This is a question with a trivial and obvious answer, I'm sure, but I can't seem to find it in the help files and books that I have handy. I have a dataframe consisting of two columns, "Gene_Name," a list of gene symbols, and "Number," a numeric measure of how frequently a tag representing that gene showed up in a SAGE library. Several of the genes are represented by multiple tags, and therefore are present more than once in the list, e.g.:

1167     Zcchc8      6
1168     Zcwpw1      5
1169     Zdhhc18     6
1170     Zdhhc20     5
1171     Zdhhc3      6
1172     Zdhhc3      5
1173     Zeb2        9
1174     Zeb2        6

  What I want is to collapse the list by gene name, such that duplicates are summed up and appear only once in the final version:

Zcchc8 6

Zcwpw1 5

Zdhhc18     6
Zdhhc20     5

Zdhhc3     11

Zeb2       15

  The only way I can figure out to do this is via rowsum:

> rowsum (Number,Gene_Name)

gives me exactly what I want, *except* that in the end, I am left with a matrix containing the Number values and with the Gene_Names used as row names (the output therefore looks exactly as printed above) -- what I want is a dataframe equivalent to the starting table, with numbered rows and separate, accessible columns containing the Gene_Name and Number values.

  I was able to put such a dataframe together manually, by cobbling together the row names of the above list with the values:

> genes.unique <- data.frame (rownames (rowsum(Number,Gene_Name)), rowsum(Number,Gene_Name))

but then I have to manually replace the row names of the dataframe with numbers, to get back to what I wanted in the first place.

  I hope this makes some sort of sense. Is there an easier way to do this? Thanks in advance!

  Charlie Murtaugh

L. Charles Murtaugh
Assistant Professor

University of Utah
Dept. of Human Genetics
15 N. 2030 E. Rm. 2100
Salt Lake City, UT 84112

tel 801-581-5958
fax 801-581-6463
email murtaugh_at_genetics.utah.edu

