[R] data frames, na.omit, and sums

From: Jason Miller <millerj_at_truman.edu>
Date: Mon 05 Dec 2005 - 11:55:06 EST


Dear R-helpers,

New to R, I'm in the middle of a project that I'm using to force me learn R. I'm running into some behavior that I don't understand, and I need some advice. In the last week I've gotten some great advice from the list on visualizing my data, and I was hoping people could help me get over another barrier I've encountered to my progress.

Before I describe what I'm trying to do and where I'm stuck with R, let me quickly outline what I need help with:
(1) summing over the non-NA entries in each row of a data frame, and
(1) using na.omit() and na.action() with rows of data from a frame.

I have a data frame that contains information about when my academic department offered courses and their enrollments. The data frame looks something like

sem year C1e C1s C2e C2s
Fall 1991 10 2 NA NA
Spring 1992 3 1 8 1
Summer 1992 NA NA 100 10

where C?e represents a specific course's enrollment that semester and C?s represents the number of sections of that course offered. The frame is filled with integers and NAs. The data frame is of medium size, with about 180 columns and 45 rows.

I need to cull some basic information from this dataset such as:
(1) total number of sections offered each semester (and each year),
(2) total number of credit hours generated each semester (and each
year), and
(3) the student-to-faculty ratio of the department each semester (and
each year).

 From a mathematical standpoint, how to do each of these is obvious to me. But having to negotiate working withing data frames and with matrices that have NA entries has really gotten me confused +frustrated. (I have no programming background.)

To calculate (1) above for semester (rows), I know how to select the "sections" columns using grep(). What I'd like to do is sum the selected frame's non-NA entries row-by-row. For some reason, I was able to do this earlier today using the rowsum() function with na.rm=TRUE, but now it's not working. It complains of non-numeric entries. (In fact, I was able to use the rowsum() function to calculate (1) for each year.) When I try to convert the data frame
(or a sub-frame) to a matrix, my integers turn into strings/
characters, and I have no idea what to do with that!

To calculate (2) above for a semester, I know how to select the enrollment columns using grep(). What I'd like to do is calculate the total credits generated by taking the dot product of each row with a vector whose components are the credit hour values of each course in my data frame. Of course, I'd nave to account for the NA values in my data frame, but in the past I've had decent luck with using na.omit() and na.action() to select the non-NA components of a vector. Unfortunately, na.omit is absolutely no working with my dataframe; it just returns the names of all the columns!

Until I get (1) and (2) figured out, I have no hope of figuring out (3).

Thank you for reading this far into this post. If you have any suggestions for how I can get na.omit() and summing to work for me, I'd appreciate hearing from you.

Jason Miller



Jason E. Miller, Ph.D.
Associate Professor of Mathematics
Truman State University
Kirksville, MO
http://pyrite.truman.edu/~millerj/
660.785.7430

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Dec 05 12:37:11 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:41:28 EST