From: Simon Blomberg <Simon.Blomberg_at_anu.edu.au>

Date: Fri 27 May 2005 - 14:37:08 EST

Date: Fri 27 May 2005 - 14:37:08 EST

predict.glm by default produces predictions on the scale of the linear predictors. If in a logistic regression, you want the predictions to be on the response scale [0,1], use

x <- predict(logistic.model, medians, type="response")

for example. See ?predict.glm for details.

Cheers,

Simon.

*>Hi
**>
*

>I am working on corpora of automatically recognized utterances, looking

*>for features that predict error in the hypothesis the recognizer is
**>proposing.
**>
**>I am using the glm functions to do logistic regression. I do this type
**>of thing:
**>
**>* logistic.model = glm(formula = similarity ~., family = binomial,
**>data = data)
**>
**>and end up with a model:
**>
**>> summary(logistic.model)
**>
**>Call:
**>glm(formula = similarity ~ ., family = binomial, data = data)
**>
**>Deviance Residuals:
**> Min 1Q Median 3Q Max
**>-3.1599 0.2334 0.3307 0.4486 1.2471
**>
**>Coefficients:
**> Estimate Std. Error z value Pr(>|z|)
**>(Intercept) 11.1923783 4.6536898 2.405 0.01617 *
**>length -0.3529775 0.2416538 -1.461 0.14410
**>meanPitch -0.0203590 0.0064752 -3.144 0.00167 **
**>minimumPitch 0.0257213 0.0053092 4.845 1.27e-06 ***
**>maximumPitch -0.0003454 0.0030008 -0.115 0.90838
**>meanF1 0.0137880 0.0047035 2.931 0.00337 **
**>meanF2 0.0040238 0.0041684 0.965 0.33439
**>meanF3 -0.0075497 0.0026751 -2.822 0.00477 **
**>meanF4 -0.0005362 0.0007443 -0.720 0.47123
**>meanF5 -0.0001560 0.0003936 -0.396 0.69187
**>ratioF2ToF1 0.2668678 2.8926149 0.092 0.92649
**>ratioF3ToF1 1.7339087 1.7655757 0.982 0.32607
**>jitter -5.2571384 10.8043359 -0.487 0.62656
**>shimmer -2.3040826 3.0581950 -0.753 0.45120
**>percentUnvoicedFrames 0.1959342 1.3041689 0.150 0.88058
**>numberOfVoiceBreaks -0.1022074 0.0823266 -1.241 0.21443
**>percentOfVoiceBreaks -0.0590097 1.2580202 -0.047 0.96259
**>meanIntensity -0.0765124 0.0612008 -1.250 0.21123
**>minimumIntensity 0.1037980 0.0331899 3.127 0.00176 **
**>maximumIntensity -0.0389995 0.0430368 -0.906 0.36484
**>ratioIntensity -2.0329346 1.2420286 -1.637 0.10168
**>noSyllsIntensity 0.1157678 0.0947699 1.222 0.22187
**>startSpeech 0.0155578 0.1343117 0.116 0.90778
**>speakingRate -0.2583315 0.1648337 -1.567 0.11706
**>---
**>Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
**>
**>(Dispersion parameter for binomial family taken to be 1)
**>
**> Null deviance: 2462.3 on 4310 degrees of freedom
**>Residual deviance: 2209.5 on 4287 degrees of freedom
**>AIC: 2257.5
**>
**>Number of Fisher Scoring iterations: 6
**>
**>
**>I have seen models where almost all the features are showing one in a
**>thousand significance but I accept that I could improve my model by
**>normalizing some of the features (some are left skewed and I understand
**>that I will get a better fir by taking their logs, for example).
**>
**>What really worries me is that the logistic function produces
**>predictions that appear to fall well outside 0 to 1.
**>
**>If I make a dataset of the medians of the above features and use my
**>logistic.model on it, it produces a
**>figure of:
**>
**> > x = predict(logistic.model, medians)
**>> x
**>[1] 2.82959
**>>
**>
**>which is well outside the range of 0 to 1.
**>
**>The actual distribution of all the predictions is:
**>
**>> summary(pred)
**> Min. 1st Qu. Median Mean 3rd Qu. Max.
**> -1.516 2.121 2.720 2.731 3.341 6.387
**>>
**>
**>I can get the model to give some sort of prediction by doing this:
**>
**>> pred = predict(logistic.model, data)
**>> pred[pred <= 1.5] = 0
**>> pred[pred > 1.5] = 1
**>> t = table(pred, data[,24])
**>> t
**>
**>pred 0 1
**> 0 102 253
**> 1 255 3701
**>>
**>> classAgreement(t)
**>$diag
**>[1] 0.8821619
**>
**>$kappa
**>[1] 0.2222949
**>
**>$rand
**>[1] 0.7920472
**>
**>$crand
**>[1] 0.1913888
**>
**>>
**>
**>but as you can see I am using a break point well outside the range 0 to
**>1 and the kappa is rather low (I think).
**>
**>I am a bit of a novice in this, and the results worry me.
**>
**>Can anyone comment if the results look strange, or if they know I am
**>doing something wrong?
**>
**>Stephen
**>
**>
**>--
**>No virus found in this outgoing message.
**>Checked by AVG Anti-Virus.
**>
**>
**>
**> [[alternative HTML version deleted]]
**>
**>______________________________________________
**>R-help@stat.math.ethz.ch mailing list
**>https://stat.ethz.ch/mailman/listinfo/r-help
**>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
*

-- Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat. Visiting Fellow School of Botany & Zoology The Australian National University Canberra ACT 0200 Australia T: +61 2 6125 8057 email: Simon.Blomberg@anu.edu.au F: +61 2 6125 5573 CRICOS Provider # 00120C ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlReceived on Fri May 27 14:43:18 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:32:08 EST
*