Re: [R] Saving misclassified records into dataframe within a loop

From: John Dennison <dennison.john_at_gmail.com>
Date: Thu, 12 May 2011 18:26:44 -0400

My apologies. I have transgressed the first law of posting, test your code. here is an updated set this should run:

library(rpart)

# grow tree
fit <- rpart(Kyphosis ~ Age + Number + Start,  method="class", data=kyphosis)
#predict
prediction<-predict(fit, kyphosis)

#create output data.frame
results<-as.data.frame(1)

#misclassification index function

predict.function <- function(x){
  j<-0

for (i in 1:length(kyphosis$Kyphosis)) { if (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){

 j<-j+1
results[j,]<-row.names(kyphosis[c(i),])

print( row.names(kyphosis[c(i),]))
} }
{
print(results)
save(results, file="results") } }

predict.function(x)

results

output: results

      1
    1 1

load("results")

results
> results

    1
1 1
2 2
3 4
4 13
5 18
6 24
7 27
8 28
9 32
10 33
11 35
12 43
13 44
14 48
15 50
16 51
17 60
18 63
19 68
20 71
21 72
22 74
23 79

why the two different 'results'??

Thanks

John Dennison

On Thu, May 12, 2011 at 6:06 PM, David Winsemius <dwinsemius_at_comcast.net>wrote:

>
> On May 12, 2011, at 5:41 PM, John Dennison wrote:
>
> Having poked the problem a couple more times it appears my issue is that
>> the
>> object i save within the loop is not available after the function ends. I
>> have no idea why it is acting in this manner.
>>
>>
>> library(rpart)
>>
>> # grow tree
>> fit <- rpart(Kyphosis ~ Age + Number + Start,
>> method="class", data=kyphosis)
>> #predict
>> prediction<-predict(fit, kyphosis)
>>
>> #misclassification index function
>>
>> results<-as.data.frame(1)
>>
>> predict.function <- function(x){
>> j<-0
>> for (i in 1:length(kyphosis$Kyphosis)) {
>> if (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){
>>
>> j<-j+1
>> results[j,]<-row.names(testing[c(i),])
>>
>
> Are we supposed to know where to find 'testing" (and if we cannot find it,
> how is the R interpreter going to find it)?
>
>
>
> print( row.names(kyphosis[c(i),]))
>> } }
>> {
>> print(results)
>> save(results, file="results") } }
>>
>>
>> i can load results from file and my out put is there. how ever if i just
>> type results i get the original 1. what is in the lords name is occurring.
>>
>> Thanks
>>
>> John
>>
>>
>>
>> On Thu, May 12, 2011 at 1:50 PM, Phil Spector <spector_at_stat.berkeley.edu
>> >wrote:
>>
>> John -
>>> In your example, the misclassified observations (as defined by
>>> your predict.function) will be
>>>
>>> kyphosis[kyphosis$Kyphosis == 'absent' & prediction[,1] != 1,]
>>>
>>> so you could start from there.
>>> - Phil Spector
>>> Statistical Computing Facility
>>> Department of Statistics
>>> UC Berkeley
>>> spector_at_stat.berkeley.edu
>>>
>>>
>>>
>>> On Thu, 12 May 2011, John Dennison wrote:
>>>
>>> Greetings R world,
>>>
>>>>
>>>> I know some version of the this question has been asked before, but i
>>>> need
>>>> to save the output of a loop into a data frame to eventually be written
>>>> to
>>>> a
>>>> postgres data base with dbWriteTable. Some background. I have developed
>>>> classifications models to help identify problem accounts. The logic is
>>>> this,
>>>> if the model classifies the record as including variable X and it turns
>>>> out
>>>> that record does not have X then it should be reviewed(ie i need the row
>>>> number/ID saved to a database). Generally i want to look at the
>>>> misclassified records. This is a little hack i know, anyone got a better
>>>> idea please let me know. Here is an example
>>>>
>>>> library(rpart)
>>>>
>>>> # grow tree
>>>> fit <- rpart(Kyphosis ~ Age + Number + Start,
>>>> method="class", data=kyphosis)
>>>> #predict
>>>> prediction<-predict(fit, kyphosis)
>>>>
>>>> #misclassification index function
>>>>
>>>> predict.function <- function(x){
>>>> for (i in 1:length(kyphosis$Kyphosis)) {
>>>> #the idea is that if the record is "absent" but the prediction is
>>>> otherwise
>>>> then show me that record
>>>> if (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){
>>>> #THIS WORKS
>>>> print( row.names(kyphosis[c(i),]))
>>>> }
>>>> } }
>>>>
>>>> predict.function(x)
>>>>
>>>> Now my issue is that i want to save these id to a data.frame so i can
>>>> later
>>>> save them to a database. This this an incorrect approach. Can I save
>>>> each
>>>> id
>>>> to the postgres instance as it is found. i have a ignorant fear of
>>>> lapply,
>>>> but it seems it may hold the key.
>>>>
>>>>
>>>> Ive tried
>>>>
>>>> predict.function <- function(x){
>>>> results<-as.data.frame(1)
>>>> for (i in 1:length(kyphosis$Kyphosis)) {
>>>> #the idea is that if the record is "absent" but the prediction is
>>>> otherwise
>>>> then show me that record
>>>> if (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){
>>>> #THIS WORKS
>>>> results[i,]<- as.data.frame(row.names(kyphosis[c(i),]))
>>>> }
>>>> } }
>>>>
>>>> this does not work. results object does not get saved. Any Help would be
>>>> greatly appreciated.
>>>>
>>>>
>>>> Thanks
>>>>
>>>> John Dennison
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help_at_r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> David Winsemius, MD
> West Hartford, CT
>
>

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 12 May 2011 - 22:34:40 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 12 May 2011 - 23:00:06 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive