[R] How to reference or sort rownames in a data frame

From: Robert A. LaBudde <ral_at_lcfltd.com>
Date: Sun, 27 May 2007 16:55:41 -0400


As I was working through elementary examples, I was using dataset
"plasma" of package "HSAUR".

In performing a logistic regression of the data, and making the diagnostic plots (R-2.5.0)

data(plasma,package='HSAUR')
plasma_1<- glm(ESR ~ fibrinogen * globulin, data=plasma, family=binomial()) layout(matrix(1:4,nrow=2))
plot(plasma_1)

I find that data points corresponding to rownames 17 and 23 are outliers and high leverage.

I would then like to perform a fit without these two rows.

In principle this should be easy, using an update() with subset=-c(17,23).

The problem is that the rownames in this dataset are not ordered, and, in fact, the relevant rows are 30 and 31, not 17 and 23.

This brings up the following (elementary?) questions:

  1. How do you reference rows in "subset=" for which you know the rownames, but not the row numbers?
  2. How do you discovery the rows corresponding to particular rownames? (Using plasma[rownames(plasma)==17,] shows the data, but NOT the row number!) (Probably the same answer as in Q. 1 above.)
  3. How do you sort (order) the rows of an existing data frame so that the rownames are in order?

I don't seem to know the magic words to find the answers to these questions in the help systems.

Obviously this can be done by writing new, brute force, functions scanning the subscripts, but there must be an (obvious?) direct way of doing this more elegantly.

Thanks for any pointers.



Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral_at_lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"



R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 27 May 2007 - 21:00:07 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 28 May 2007 - 05:31:34 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.