Re: [R] Saving misclassified records into dataframe within a loop

From: John Dennison <dennison.john_at_gmail.com>
Date: Thu, 12 May 2011 18:49:45 -0400

It is little ugly i agree but it is acting as it should. I am trying to capture the cases where the model produced a false positive but only for one of the variables. ie where the model predicts "present" but the case is "absent". I know this is only half of the misclassifications, but the inverse is not interesting to me. I just imported the logic from my own application to a general case, my apologies. Take that part as correct. How would we save the rows it does returns.

Thanks,

John

On Thu, May 12, 2011 at 6:37 PM, David Winsemius <dwinsemius_at_comcast.net>wrote:

>
> On May 12, 2011, at 6:26 PM, John Dennison wrote:
>
> My apologies. I have transgressed the first law of posting, test your
>> code. here is an updated set this should run:
>>
>> library(rpart)
>>
>> # grow tree
>> fit <- rpart(Kyphosis ~ Age + Number + Start,
>> method="class", data=kyphosis)
>> #predict
>> prediction<-predict(fit, kyphosis)
>>
>> #create output data.frame
>> results<-as.data.frame(1)
>>
>>
>> #misclassification index function
>>
>> predict.function <- function(x){
>> j<-0
>>
>> for (i in 1:length(kyphosis$Kyphosis)) {
>> if (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){
>>
>
> I think your next task is figuring out if this expression ,,,, which you
> have not explained at all ... is really doing what you intend:
>
>
> (kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0
>
> I would have guessed that you might be intending:
>
>
> kyphosis$Kyphosis[i]=="absent" & prediction[i,1]==1
>
> Since it will hold about half the time:
>
> > sum(kyphosis$Kyphosis[1:81]=="absent" & prediction[1:81,1]==1)
> [1] 41
>
>
>
>
>> j<-j+1
>> results[j,]<-row.names(kyphosis[c(i),])
>>
>> print( row.names(kyphosis[c(i),]))
>> } }
>> {
>> print(results)
>> save(results, file="results") } }
>>
>>
>> predict.function(x)
>>
>>
>> results
>>
>> output: results
>> 1
>> 1 1
>>
>>
>> load("results")
>>
>> results
>> > results
>> 1
>> 1 1
>> 2 2
>> 3 4
>> 4 13
>> 5 18
>> 6 24
>> 7 27
>> 8 28
>> 9 32
>> 10 33
>> 11 35
>> 12 43
>> 13 44
>> 14 48
>> 15 50
>> 16 51
>> 17 60
>> 18 63
>> 19 68
>> 20 71
>> 21 72
>> 22 74
>> 23 79
>>
>> why the two different 'results'??
>>
>> Thanks
>>
>> John Dennison
>>
>> On Thu, May 12, 2011 at 6:06 PM, David Winsemius <dwinsemius_at_comcast.net>
>> wrote:
>>
>> On May 12, 2011, at 5:41 PM, John Dennison wrote:
>>
>> Having poked the problem a couple more times it appears my issue is that
>> the
>> object i save within the loop is not available after the function ends. I
>> have no idea why it is acting in this manner.
>>
>>
>> library(rpart)
>>
>> # grow tree
>> fit <- rpart(Kyphosis ~ Age + Number + Start,
>> method="class", data=kyphosis)
>> #predict
>> prediction<-predict(fit, kyphosis)
>>
>> #misclassification index function
>>
>> results<-as.data.frame(1)
>>
>> predict.function <- function(x){
>> j<-0
>> for (i in 1:length(kyphosis$Kyphosis)) {
>> if (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){
>>
>> j<-j+1
>> results[j,]<-row.names(testing[c(i),])
>>
>> Are we supposed to know where to find 'testing" (and if we cannot find
>> it, how is the R interpreter going to find it)?
>>
>>
>>
>> print( row.names(kyphosis[c(i),]))
>> } }
>> {
>> print(results)
>> save(results, file="results") } }
>>
>>
>> i can load results from file and my out put is there. how ever if i just
>> type results i get the original 1. what is in the lords name is occurring.
>>
>> Thanks
>>
>> John
>>
>>
>>
>> On Thu, May 12, 2011 at 1:50 PM, Phil Spector <spector_at_stat.berkeley.edu
>> >wrote:
>>
>> John -
>> In your example, the misclassified observations (as defined by
>> your predict.function) will be
>>
>> kyphosis[kyphosis$Kyphosis == 'absent' & prediction[,1] != 1,]
>>
>> so you could start from there.
>> - Phil Spector
>> Statistical Computing Facility
>> Department of Statistics
>> UC Berkeley
>> spector_at_stat.berkeley.edu
>>
>>
>>
>> On Thu, 12 May 2011, John Dennison wrote:
>>
>> Greetings R world,
>>
>> I know some version of the this question has been asked before, but i need
>> to save the output of a loop into a data frame to eventually be written to
>> a
>> postgres data base with dbWriteTable. Some background. I have developed
>> classifications models to help identify problem accounts. The logic is
>> this,
>> if the model classifies the record as including variable X and it turns
>> out
>> that record does not have X then it should be reviewed(ie i need the row
>> number/ID saved to a database). Generally i want to look at the
>> misclassified records. This is a little hack i know, anyone got a better
>> idea please let me know. Here is an example
>>
>> library(rpart)
>>
>> # grow tree
>> fit <- rpart(Kyphosis ~ Age + Number + Start,
>> method="class", data=kyphosis)
>> #predict
>> prediction<-predict(fit, kyphosis)
>>
>> #misclassification index function
>>
>> predict.function <- function(x){
>> for (i in 1:length(kyphosis$Kyphosis)) {
>> #the idea is that if the record is "absent" but the prediction is
>> otherwise
>> then show me that record
>> if (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){
>> #THIS WORKS
>> print( row.names(kyphosis[c(i),]))
>> }
>> } }
>>
>> predict.function(x)
>>
>> Now my issue is that i want to save these id to a data.frame so i can
>> later
>> save them to a database. This this an incorrect approach. Can I save each
>> id
>> to the postgres instance as it is found. i have a ignorant fear of lapply,
>> but it seems it may hold the key.
>>
>>
>> Ive tried
>>
>> predict.function <- function(x){
>> results<-as.data.frame(1)
>> for (i in 1:length(kyphosis$Kyphosis)) {
>> #the idea is that if the record is "absent" but the prediction is
>> otherwise
>> then show me that record
>> if (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){
>> #THIS WORKS
>> results[i,]<- as.data.frame(row.names(kyphosis[c(i),]))
>> }
>> } }
>>
>> this does not work. results object does not get saved. Any Help would be
>> greatly appreciated.
>>
>>
>> Thanks
>>
>> John Dennison
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>>
>>
> David Winsemius, MD
> West Hartford, CT
>
>

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 12 May 2011 - 22:53:01 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 13 May 2011 - 01:40:06 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive