From: Petr Savicky <savicky_at_praha1.ff.cuni.cz>

Date: Thu, 14 Apr 2011 08:58:44 +0200

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 14 Apr 2011 - 07:06:40 GMT

Date: Thu, 14 Apr 2011 08:58:44 +0200

On Wed, Apr 13, 2011 at 04:12:39PM -0700, helin_susam wrote:

> Hi dear list,

*>
**> I want to compare the amount of computation of two functions. For example,
**> by using this algorithm;
**>
**> data <- rnorm(n=100, mean=10, sd=3)
**>
**> output1 <- list ()
**> for(i in 1:100) {
**> data1 <- sample(100, 100, replace = TRUE)
**> statistic1 <- mean(data1)
**> output1 <- c(output1, list(statistic1))
**> }
**> output1
**>
**> output2 <- list()
**> for(i in 1:100) {
**> data2 <- unique(sample(100, 100, replace=TRUE))
**> statistic2 <- mean(data2)
**> output2 <- c(output2, list(statistic2))
**> }
**> output2
**>
**> data1 consists of exactly 100 elements, but data2 consists of roughly 55 or
**> 60 elements. So, to get statistic1, for each sample, 100 data points are
**> used. But, to get statistic2 roughly half of them are used.
**> I want to proof this difference. Is there any way to do this ?
*

Hi.

Every number from 1:100 has probability 1 - (1 - 1/100)^100 = 0.6339677 to appear in sample(100, 100, replace=TRUE). So, the expected length of data2 is 63.39677. If you want to estimate the distribution of the lengths of data2 using a simulation, then record length(data2). For example

n <- 10000

s <- rep(NA, times=n)

for (i in 1:n) {

s[i] <- length(unique(sample(100, 100, replace=TRUE)))
}

cbind(table(s))

I obtained

[,1]

53 5

54 16

55 27

56 82

57 165

58 294

59 465

60 672

61 970

62 1168

63 1283

64 1303

65 1111

66 882

67 626

68 435

69 250

70 143

71 57

72 27

73 14

74 5

In this case, mean(sample(100, 100, replace=TRUE)) and mean(unique(sample(100, 100, replace=TRUE))) have the same expected value 50.5. However, eliminating repeated values may, in general, change the expected value of the sample mean.

Hope this helps.

Petr Savicky.

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 14 Apr 2011 - 07:06:40 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 14 Apr 2011 - 08:50:30 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*