Re: [R] Formula for whether hat value is influential?

From: Gavin Simpson <gavin.simpson_at_ucl.ac.uk>
Date: Sun, 09 Mar 2008 09:53:46 +0000

On Sat, 2008-03-08 at 19:38 -0800, Paul Lynch wrote:
> I was wondering if someone might be able to tell me what formula R's
> influence.measures function uses for determining whether the hat value
> it computes is influential (i.e., the true/false value in the "hat"
> column of the returned is.inf data frame). The reason I'm asking is
> that its results disagree with what I've just learned in my statistics
> class, namely that a point should be considered influential if h_ii >
> 2(k+1)/n, where k+1 is the number of parameters in the model and n is
> the number of data points. My 2(k+1)/n value would mark at least one
> more point influential than influence.measures does for the data set
> I'm looking at.

This is R, which because it is open source, you have access to all the source code - type influence.measures (without () )at the prompt to see a version without any comments.

In the in-line function is.influential(), you'll find the critical levels used. The hat values are in infmat[, k + 4], which is the last column (where k is the number of terms in the model, inc. the intercept if present). The relevant part of is.influential is:

infmat[, k + 4] > (3 * k)/n

So R is using (3*(k+1)) / n in your notation (in the R code k is the number of terms in the model, *including* the intercept if present in the model).

The function was originally in John Fox's car package that is support software for his book Companion to Applied Regression. In that book, IIRC, Fox uses two cut-offs for hat values or 2 or 3 times the average hat value as indicating influential observations. R is using the upper level here. I would check out some of the references cited in the References section of ?influence.measures to see why this has been chosen.

HTH G

>
> I am using R 2.4.1 under Windows. (Upgrading is difficult due to
> rather severe security policies.)
>
> Thanks,
>
> --Paul
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sun 09 Mar 2008 - 09:56:11 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 09 Mar 2008 - 14:00:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive