Re: [R] logistic regression

From: Simon Blomberg <Simon.Blomberg_at_anu.edu.au>
Date: Fri 27 May 2005 - 14:37:08 EST

predict.glm by default produces predictions on the scale of the linear predictors. If in a logistic regression, you want the predictions to be on the response scale [0,1], use

x <- predict(logistic.model, medians, type="response")

for example. See ?predict.glm for details.

Cheers,

Simon.

>Hi
>
>I am working on corpora of automatically recognized utterances, looking
>for features that predict error in the hypothesis the recognizer is
>proposing.
>
>I am using the glm functions to do logistic regression. I do this type
>of thing:
>
>* logistic.model = glm(formula = similarity ~., family = binomial,
>data = data)
>
>and end up with a model:
>
>> summary(logistic.model)

>
>Call:
>glm(formula = similarity ~ ., family = binomial, data = data)
>
>Deviance Residuals:
> Min 1Q Median 3Q Max
>-3.1599 0.2334 0.3307 0.4486 1.2471
>
>Coefficients:
> Estimate Std. Error z value Pr(>|z|)
>(Intercept) 11.1923783 4.6536898 2.405 0.01617 *
>length -0.3529775 0.2416538 -1.461 0.14410
>meanPitch -0.0203590 0.0064752 -3.144 0.00167 **
>minimumPitch 0.0257213 0.0053092 4.845 1.27e-06 ***
>maximumPitch -0.0003454 0.0030008 -0.115 0.90838
>meanF1 0.0137880 0.0047035 2.931 0.00337 **
>meanF2 0.0040238 0.0041684 0.965 0.33439
>meanF3 -0.0075497 0.0026751 -2.822 0.00477 **
>meanF4 -0.0005362 0.0007443 -0.720 0.47123
>meanF5 -0.0001560 0.0003936 -0.396 0.69187
>ratioF2ToF1 0.2668678 2.8926149 0.092 0.92649
>ratioF3ToF1 1.7339087 1.7655757 0.982 0.32607
>jitter -5.2571384 10.8043359 -0.487 0.62656
>shimmer -2.3040826 3.0581950 -0.753 0.45120
>percentUnvoicedFrames 0.1959342 1.3041689 0.150 0.88058
>numberOfVoiceBreaks -0.1022074 0.0823266 -1.241 0.21443
>percentOfVoiceBreaks -0.0590097 1.2580202 -0.047 0.96259
>meanIntensity -0.0765124 0.0612008 -1.250 0.21123
>minimumIntensity 0.1037980 0.0331899 3.127 0.00176 **
>maximumIntensity -0.0389995 0.0430368 -0.906 0.36484
>ratioIntensity -2.0329346 1.2420286 -1.637 0.10168
>noSyllsIntensity 0.1157678 0.0947699 1.222 0.22187
>startSpeech 0.0155578 0.1343117 0.116 0.90778
>speakingRate -0.2583315 0.1648337 -1.567 0.11706
>---
>Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
>(Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 2462.3 on 4310 degrees of freedom
>Residual deviance: 2209.5 on 4287 degrees of freedom
>AIC: 2257.5
>
>Number of Fisher Scoring iterations: 6
>
>
>I have seen models where almost all the features are showing one in a
>thousand significance but I accept that I could improve my model by
>normalizing some of the features (some are left skewed and I understand
>that I will get a better fir by taking their logs, for example).
>
>What really worries me is that the logistic function produces
>predictions that appear to fall well outside 0 to 1.
>
>If I make a dataset of the medians of the above features and use my
>logistic.model on it, it produces a
>figure of:
>
> > x = predict(logistic.model, medians)
>> x
>[1] 2.82959
>>
>
>which is well outside the range of 0 to 1.
>
>The actual distribution of all the predictions is:
>
>> summary(pred)
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> -1.516 2.121 2.720 2.731 3.341 6.387
>>
>
>I can get the model to give some sort of prediction by doing this:
>
>> pred = predict(logistic.model, data)
>> pred[pred <= 1.5] = 0
>> pred[pred > 1.5] = 1
>> t = table(pred, data[,24])
>> t
>
>pred 0 1
> 0 102 253
> 1 255 3701
>>
>> classAgreement(t)
>$diag
>[1] 0.8821619
>
>$kappa
>[1] 0.2222949
>
>$rand
>[1] 0.7920472
>
>$crand
>[1] 0.1913888
>
>>
>
>but as you can see I am using a break point well outside the range 0 to
>1 and the kappa is rather low (I think).
>
>I am a bit of a novice in this, and the results worry me.
>
>Can anyone comment if the results look strange, or if they know I am
>doing something wrong?
>
>Stephen
>
>
>--
>No virus found in this outgoing message.
>Checked by AVG Anti-Virus.
>
>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help@stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat.
Visiting Fellow
School of Botany & Zoology
The Australian National University
Canberra ACT 0200
Australia

T: +61 2 6125 8057  email: Simon.Blomberg@anu.edu.au
F: +61 2 6125 5573

CRICOS Provider # 00120C

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Fri May 27 14:43:18 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:08 EST