[R] Function sort.data.frame

From: Kevin Wright <kwright_at_eskimo.com>
Date: Sat 25 Sep 2004 - 00:24:10 EST

I can never remember how to use "order" to sort the rows of a data frame, so like any good, lazy programmer, I decided to write my own function.

The idea is to specify a data.frame and a one-sided formula with +/- indicating ascending/descending. For example:   sort.data.frame(~ +nitro -Variety, Oats)

Since sorting of a data.frame is an oft-asked question on this list, I am posting my function in hopes that others may find it useful.

Computing 'on the language' (formulas) is not my strongest point, so the function can probably be improved. A similar idea could be used for matrix objects. Feedback is welcome.

Kevin Wright

sort.data.frame <- function(form,dat){
# Author: Kevin Wright
# Some ideas from Andy Liaw
# http://tolstoy.newcastle.edu.au/R/help/04/07/1076.html

# Use + for ascending, - for decending.
# Sorting is left to right in the formula
  

# Useage is either of the following:
# sort.data.frame(~Block-Variety,Oats)
# sort.data.frame(Oats,~-Variety+Block)
  

# If dat is the formula, then switch form and dat
  if(inherits(dat,"formula")){
    f=dat
    dat=form
    form=f
  }
  if(form[[1]] != "~")
    stop("Formula must be one-sided.")

# Make the formula into character and remove spaces
  formc <- as.character(form[2])
  formc <- gsub(" ","",formc)
# If the first character is not + or -, add +
  if(!is.element(substring(formc,1,1),c("+","-")))     formc <- paste("+",formc,sep="")
# Extract the variables from the formula
  vars <- unlist(strsplit(formc, "[\\+\\-]"))   vars <- vars[vars!=""] # Remove spurious "" terms

# Build a list of arguments to pass to "order" function
  calllist <- list()
  pos=1 # Position of + or -
  for(i in 1:length(vars)){
    varsign <- substring(formc,pos,pos)
    pos <- pos+1+nchar(vars[i])
    if(is.factor(dat[,vars[i]])){

      if(varsign=="-")
        calllist[[i]] <- -rank(dat[,vars[i]])
      else
        calllist[[i]] <- rank(dat[,vars[i]])
    }
    else {
      if(varsign=="-")
        calllist[[i]] <- -dat[,vars[i]]
      else
        calllist[[i]] <- dat[,vars[i]]

    }
  }
  dat[do.call("order",calllist),]

}

d = data.frame(b=factor(c("Hi","Med","Hi","Low"),levels=c("Low","Med","Hi"),

               ordered=TRUE),
               x=c("A","D","A","C"),y=c(8,3,9,9),z=c(1,1,1,2))
sort.data.frame(~-z-b,d)
sort.data.frame(~x+y+z,d)

sort.data.frame(~-x+y+z,d)
sort.data.frame(d,~x-y+z)

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Sep 25 00:36:53 2004

This archive was generated by hypermail 2.1.8 : Wed 03 Nov 2004 - 22:57:05 EST