From: Stephen Choularton <mail_at_bymouth.com>

Date: Fri 27 May 2005 - 14:22:33 EST

Date: Fri 27 May 2005 - 14:22:33 EST

Hi

I am using the glm functions to do logistic regression. I do this type of thing:

and end up with a model:

*> summary(logistic.model)
*

Call:

glm(formula = similarity ~ ., family = binomial, data = data)

Deviance Residuals:

Min 1Q Median 3Q Max -3.1599 0.2334 0.3307 0.4486 1.2471

Coefficients:

Estimate Std. Error z value Pr(>|z|) (Intercept) 11.1923783 4.6536898 2.405 0.01617 * length -0.3529775 0.2416538 -1.461 0.14410 meanPitch -0.0203590 0.0064752 -3.144 0.00167 ** minimumPitch 0.0257213 0.0053092 4.845 1.27e-06 *** maximumPitch -0.0003454 0.0030008 -0.115 0.90838 meanF1 0.0137880 0.0047035 2.931 0.00337 ** meanF2 0.0040238 0.0041684 0.965 0.33439 meanF3 -0.0075497 0.0026751 -2.822 0.00477 ** meanF4 -0.0005362 0.0007443 -0.720 0.47123 meanF5 -0.0001560 0.0003936 -0.396 0.69187 ratioF2ToF1 0.2668678 2.8926149 0.092 0.92649 ratioF3ToF1 1.7339087 1.7655757 0.982 0.32607 jitter -5.2571384 10.8043359 -0.487 0.62656 shimmer -2.3040826 3.0581950 -0.753 0.45120 percentUnvoicedFrames 0.1959342 1.3041689 0.150 0.88058 numberOfVoiceBreaks -0.1022074 0.0823266 -1.241 0.21443 percentOfVoiceBreaks -0.0590097 1.2580202 -0.047 0.96259 meanIntensity -0.0765124 0.0612008 -1.250 0.21123 minimumIntensity 0.1037980 0.0331899 3.127 0.00176 ** maximumIntensity -0.0389995 0.0430368 -0.906 0.36484 ratioIntensity -2.0329346 1.2420286 -1.637 0.10168 noSyllsIntensity 0.1157678 0.0947699 1.222 0.22187 startSpeech 0.0155578 0.1343117 0.116 0.90778 speakingRate -0.2583315 0.1648337 -1.567 0.11706

--- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 2462.3 on 4310 degrees of freedom Residual deviance: 2209.5 on 4287 degrees of freedom AIC: 2257.5 Number of Fisher Scoring iterations: 6 I have seen models where almost all the features are showing one in a thousand significance but I accept that I could improve my model by normalizing some of the features (some are left skewed and I understand that I will get a better fir by taking their logs, for example). What really worries me is that the logistic function produces predictions that appear to fall well outside 0 to 1. If I make a dataset of the medians of the above features and use my logistic.model on it, it produces a figure of:Received on Fri May 27 14:28:45 2005

> x = predict(logistic.model, medians)> x

[1] 2.82959

>

which is well outside the range of 0 to 1. The actual distribution of all the predictions is:

> summary(pred)

Min. 1st Qu. Median Mean 3rd Qu. Max. -1.516 2.121 2.720 2.731 3.341 6.387

>

I can get the model to give some sort of prediction by doing this:

> pred = predict(logistic.model, data)> pred[pred <= 1.5] = 0> pred[pred > 1.5] = 1

> t = table(pred, data[,24])

> t

pred 0 1 0 102 253 1 255 3701

>> classAgreement(t)

$diag [1] 0.8821619 $kappa [1] 0.2222949 $rand [1] 0.7920472 $crand [1] 0.1913888

>

but as you can see I am using a break point well outside the range 0 to 1 and the kappa is rather low (I think). I am a bit of a novice in this, and the results worry me. Can anyone comment if the results look strange, or if they know I am doing something wrong? Stephen -- No virus found in this outgoing message. Checked by AVG Anti-Virus. [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:32:08 EST
*