Re: [R] KS Test Warning Message

From: Christoph Buser <buser_at_stat.math.ethz.ch>
Date: Mon 10 Jul 2006 - 17:35:24 EST

Dear Justin

Ties means that you have identical values in "Year5.lm$residuals". Please remark that you can have a large R^2, but your residuals are not normally distributed. A large R^2 shows a strong linear relationship, but that does not say anything about the error distribution (see example below).

So to answer your question. Yes it can take away validity of your model if the residuals are not normally distributed, especially tests and confidence intervals for your parameters are based on the normal assumption.
I'd recommend to verify model assumptions by graphical tools, such as qqplot, Tukey-Anscombe Plot, ... Try:

plot(Year5.lm)

The power of KS-Test is quite small and graphical tools will give you a hint about your true error distribution instead of giving you only a p-value that "tells you" that the errors are not normal.

set.seed(3)
x <- 1:100
## t-distributed errors
y <- x + rt(100,2)
## Strong linear relationship
plot(x,y)

## High R^2 due to strong linear relationship summary(reg <- lm(y~x))
## The residuals are not normal distributed qqnorm(resid(reg))
## Small power of KS-Test. Violation of model assumption is not detected ks.test(resid(reg), "pnorm")

Best regards,

Christoph Buser



Christoph Buser <buser@stat.math.ethz.ch> Seminar fuer Statistik, LEO C13
ETH Zurich	8092 Zurich	 SWITZERLAND
phone: x-41-44-632-4673		fax: 632-1228

http://stat.ethz.ch/~buser/

justin rapp writes:
> All,
>
> Happy World Cup and Wimbledon. This morning finds me with the first
> of my many daily questions.
>
> I am running a ks.test on residuals obtained from a regression model.
>
> I use this code:
> > ks.test(Year5.lm$residuals,pnorm)
>
> and obtain this output
> One-sample Kolmogorov-Smirnov test
>
> data: Year5.lm$residuals
> D = 0.7196, p-value < 2.2e-16
> alternative hypothesis: two.sided
>
> Warning message:
> cannot compute correct p-values with ties in: ks.test(Year5.lm$residuals, pnorm)
>
> I am wondering if anybody can tell me what this error message means.
>
> Also, could anybody clarify how I could have a regression model with a
> high Rsquared, rouglhy .67, but with nonnormal residuals? Does this
> take away from the validity of my model?
>
> jdr
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Jul 10 17:42:37 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 10 Jul 2006 - 18:15:26 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.