[R] Difference between the S-plus influence and R empinf functions

From: The r newbie Fred <thernewbiefred_at_rocketmail.com>
Date: Mon, 07 Mar 2011 01:03:10 -0800 (PST)


Hello everyone !

I am currently trying to convert a program from S-plus to R, and I am having some trouble with the S-plus function called "influence(data, statistic,...)".

This function aims to "calculate empirical influence values and related quantities",
and is part of the Resample library that I cannot find for R. However, 2 similar functions are available in R:

I didn't manage to use the lm.influence() function correctly, because it needs a linear model
as input (lm, glm), and what I have as input is a function (I don't know well R/S-plus languages,

so I may be mistaken, but I believe lm.influence() is not what I should use for my problem ...?)
I have tried to use the R empinf() instead but I am stucked with a problem concerning the

input argument "group" that I cannot translate in R.

Here is a copy of the S-plus "influence()" help concerning this argument: group : vector of length equal to the number of observations in data, for stratified sampling or

multiple-sample problems. Sampling is done separately for each group (determined by unique values

of this vector). If data is a data frame, this may be a variable in the data frame, or expression

involving such variables.

empinf() accepts an argument called "strata" but it doesn't seem to correspond to "group".

Below is a sample test showing my problem:

"testinflu" = function(data, weights) { sum(data[,1]*weights) } mydata <- cbind(c(1,2,3,4,5), c(1,1,1,1,0))

# In S-plus :
> testinflu(data=mydata, weights=rep(1,length(mydata[,1]))) 15     

# In R:
> testinflu(data=mydata, weights=rep(1,length(mydata[,1]))) 15          

# In S-plus :
> influence(data = mydata, statistic=testinflu)$L

          testinflu
[1,] -2.000000e+000
[2,] -1.000000e+000
[3,] -1.776357e-013
[4,] 1.000000e+000
[5,] 2.000000e+000

# In R :
> empinf(data = mydata, statistic=testinflu)
[1] -2.000000e+00 -1.000000e+00 2.220446e-12 1.000000e+00 2.000000e+00
# ==> OK

# In S-plus :
> influence(data = mydata, statistic=testinflu, group = mydata[, 2])$L

     testinflu 

[1,] -1.2
[2,] -0.4
[3,] 0.4
[4,] 1.2
[5,] 0.0

# In R:
> empinf(data = mydata, statistic=testinflu, strata = mydata[, 2])
[1] -1.5 -0.5 0.5 1.5 0.0

# ==> NOT OK So I have a few questions:
- has anyone already experienced the same kind of problem with the influence function ?
- is it possible to mimic the use of the "group" argument in empinf() ?

I have looked for answers on the web but couldn't find anythings really helpful, so if someone has an idea I would really appreciate it !! :)

Thanks,
Fred

ps : sorry for my broken English ...

      
	[[alternative HTML version deleted]]

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 07 Mar 2011 - 09:41:16 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 07 Mar 2011 - 12:40:19 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive