Re: [R] Saving misclassified records into dataframe within a loop

From: William Dunlap <wdunlap_at_tibco.com>
Date: Thu, 12 May 2011 16:47:10 -0700

Your question concerned how to return data from a function. It looks like you are using the following idiom to save the data a function generates:
  f <- function() {

     result <- ... some calculations ...
     save(result, file="result.Rdata")

  }
  load("result.Rdata")
  ... now you will find a dataset called "result" ... The save call stores f's local dataset called 'result' in a file and the load call loads the data from the file into a dataset also called result but in a different frame (the frame of the caller of f, not f's frame).

Don't use save() and load() for this sort of thing. It will mystify people reading your code and make the code difficult to reuse.

Instead return the value of f's result from f and use the assignment operator when calling f to store that return value in the caller's frame:   f <- function() {

     fResult <- ... some calculations ...
     fResult # the return value of f

  }
  result <- f()
When f is finished all variables in it disappear and its return value is passed back to its caller, who can name it or use it directly in another function call.

You didn't ask about the following, but the code   results <- as.data.frame(1)
  j <- 0
  for (i in 1:length(kyphosis$Kyphosis)) {     if (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){

      j <- j+1
      results[j,] <- row.names(kyphosis[c(i),])
    }
  }
may be written without the for loop as
  isMisclassified <- ((kyphosis$Kyphosis=="absent") == (prediction[,1]==1)) == 0
  results <- data.frame("1" = rownames(kyphosis)[isMisclassified], check.names=FALSE, stringsAsFactors=FALSE) Note the the isMisclassified<- line is your line with the subscripts 'i' taken out, as we want to evaluate the condition for all i.
I find the intent of that easier to understand than that of the code in the for loop.

I don't know why you want 'results' to be a data.frame instead of a simple character vector; the expression   rownames(kyphosis)[isMisclassified]
would give you that.

Also, since 'i' is an integer,
  c(i)
is just a long-winded way of saying
  i

The test
  logicalValue == 0
really ought to have the same type of data on both sides of the ==, as in
  logicalValue == FALSE
or, even better in this case,
  !logicalValue # bang means not
or, since logicalValue is x==y you could replace !(x==y) with   x != y
so the following is equivalent to what you wrote   isMisclassified <- (kyphosis$Kyphosis=="absent") != (prediction[,1]==1)
(and, in my opinion, the latter is easier to understand).

Finally, you defined a function of one argument, x, and didn't use the argument. Functions don't need arguments,

   f <- function() {

      ....
   }
would do just as well.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----

> From: r-help-bounces_at_r-project.org 
> [mailto:r-help-bounces_at_r-project.org] On Behalf Of John Dennison
> Sent: Thursday, May 12, 2011 2:41 PM
> To: r-help_at_r-project.org
> Subject: Re: [R] Saving misclassified records into dataframe 
> within a loop
> 
> Having poked the problem a couple more times it appears my 
> issue is that the
> object i save within the loop is not available after the 
> function ends. I
> have no idea why it is acting in this manner.
> 
> 
> library(rpart)
> 
> # grow tree
> fit <- rpart(Kyphosis ~ Age + Number + Start,
>  method="class", data=kyphosis)
> #predict
> prediction<-predict(fit, kyphosis)
> 
> #misclassification index function
> 
> results<-as.data.frame(1)
> 
> predict.function <- function(x){
>   j<-0
> for (i in 1:length(kyphosis$Kyphosis)) {
> if (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){
> 
>  j<-j+1
> results[j,]<-row.names(testing[c(i),])
> print( row.names(kyphosis[c(i),]))
> } }
> {
> print(results)
> save(results, file="results") } }
> 
> 
> i can load results from file and my out put is there. how 
> ever if i just
> type results i get the original 1. what is in the lords name 
> is occurring.
> 
> Thanks
> 
> John
> 
> 
> 
> On Thu, May 12, 2011 at 1:50 PM, Phil Spector 
> <spector_at_stat.berkeley.edu>wrote:
> 
> > John -
> >   In your example, the misclassified observations (as defined by
> > your predict.function) will be
> >
> >  kyphosis[kyphosis$Kyphosis == 'absent' & prediction[,1] != 1,]
> >
> > so you could start from there.
> >                                        - Phil Spector
> >                                         Statistical 
> Computing Facility
> >                                         Department of Statistics
> >                                         UC Berkeley
> >                                         spector_at_stat.berkeley.edu
> >
> >
> >
> > On Thu, 12 May 2011, John Dennison wrote:
> >
> >  Greetings R world,
> >>
> >> I know some version of the this question has been asked 
> before, but i need
> >> to save the output of a loop into a data frame to 
> eventually be written to
> >> a
> >> postgres data base with dbWriteTable. Some background. I 
> have developed
> >> classifications models to help identify problem accounts. 
> The logic is
> >> this,
> >> if the model classifies the record as including variable X 
> and it turns
> >> out
> >> that record does not have X then it should be reviewed(ie 
> i need the row
> >> number/ID saved to a database). Generally i want to look at the
> >> misclassified records. This is a little hack i know, 
> anyone got a better
> >> idea please let me know. Here is an example
> >>
> >> library(rpart)
> >>
> >> # grow tree
> >> fit <- rpart(Kyphosis ~ Age + Number + Start,
> >>  method="class", data=kyphosis)
> >> #predict
> >> prediction<-predict(fit, kyphosis)
> >>
> >> #misclassification index function
> >>
> >> predict.function <- function(x){
> >> for (i in 1:length(kyphosis$Kyphosis)) {
> >> #the idea is that if the record is "absent" but the prediction is
> >> otherwise
> >> then show me that record
> >> if 
> (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){
> >>  #THIS WORKS
> >> print( row.names(kyphosis[c(i),]))
> >> }
> >> } }
> >>
> >> predict.function(x)
> >>
> >> Now my issue is that i want to save these id to a 
> data.frame so i can
> >> later
> >> save them to a database. This this an incorrect approach. 
> Can I save each
> >> id
> >> to the postgres instance as it is found. i have a ignorant 
> fear of lapply,
> >> but it seems it may hold the key.
> >>
> >>
> >> Ive tried
> >>
> >> predict.function <- function(x){
> >> results<-as.data.frame(1)
> >> for (i in 1:length(kyphosis$Kyphosis)) {
> >> #the idea is that if the record is "absent" but the prediction is
> >> otherwise
> >> then show me that record
> >> if 
> (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){
> >>  #THIS WORKS
> >> results[i,]<- as.data.frame(row.names(kyphosis[c(i),]))
> >> }
> >> } }
> >>
> >> this does not work. results object does not get saved. Any 
> Help would be
> >> greatly appreciated.
> >>
> >>
> >> Thanks
> >>
> >> John Dennison
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help_at_r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 13 May 2011 - 00:01:47 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 13 May 2011 - 15:30:06 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive