Thanks! The results were similar to the t.test p-values show (I have
four samples).

Thank you also for using that replicate-function which i didn't know.
Till now I have just used for-loops that are not so beautiful... i
don't know about the speed. Have to test that.

Atte

Greg Snow kirjoitti 26.6.2010 kello 23.30:

> No I mean something like this, assuming that the iris dataset

*> contains the full population and we want to see if Setaso have a
**> different mean than the population (the null would be that there is
**> no difference in sepal width between species, or that species tells
**> nothing about sepal width):
**>
**>
**> out1 <- replicate( 100000, mean(sample(iris$Sepal.Width, 50)) )
**> obs1 <- mean( iris$Sepal.Width[1:50] )
**>
**> hist(out1, xlim=range(out1,obs1))
**> abline(v=obs1)
**>
**> mean( out1 > obs1 )
**>
**>
**> I donÕt have a reference (other than a text book that defines
**> sampling distributions).
**>
**> --
**> Gregory (Greg) L. Snow Ph.D.
**> Statistical Data Center
**> Intermountain Healthcare
**> greg.snow_at_imail.org
**> 801.408.8111
**>
**> From: Atte Tenkanen [mailto:attenka_at_utu.fi]
**> Sent: Friday, June 25, 2010 10:08 PM
**> To: Atte Tenkanen
**> Cc: Greg Snow; David Winsemius; R mailing list
**> Subject: Re: [R] Wilcoxon signed rank test and its requirements
**>
**>
**> Atte Tenkanen kirjoitti 26.6.2010 kello 5.15:
**>
**>
**>
**> Greg Snow kirjoitti 25.6.2010 kello 21.55:
**>
**>
**> Let me see if I understand. You actually have the data for the
**> whole population (the entire piece) but you have some pre-defined
**> sections that you want to see if they differ from the population,
**> or more meaningfully they are different from a randomly selected
**> set of measures. Is that correct?
**>
**> If so, since you have the entire population of interest you can
**> create the actual sampling distribution (or a good approximation of
**> it). Just take random samples from the population of the given
**> size (matching the subset you are interested in) and calculate the
**> means (or other value of interest), probably 10,000 to 1,000,000
**> samples. Now compare the value from your predefined subset to the
**> set of random values you generated to see if it is in the tail or not.
**>
**> I check, so you mean doing it this way:
**>
**> t.test(sample(POPUL, length(SAMPLE), replace = FALSE), mu=mean
**> (SAMPLE), alt = "less")
**>
**> NO, this way:
**>
**> t.test(POPUL[sample(1:length(POPUL), length(SAMPLE), replace =
**> FALSE)], mu=mean(SAMPLE), alt = "less")
**>
**> Atte
**>
**>
**>
**> Atte
**>
**>
**>
**> --
**> Gregory (Greg) L. Snow Ph.D.
**> Statistical Data Center
**> Intermountain Healthcare
**> greg.snow_at_imail.org
**> 801.408.8111
**>
**>
**> -----Original Message-----
**> From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-
**> project.org] On Behalf Of Atte Tenkanen
**> Sent: Thursday, June 24, 2010 11:04 PM
**> To: David Winsemius
**> Cc: R mailing list
**> Subject: Re: [R] Wilcoxon signed rank test and its requirements
**>
**> The values come from this kind of process:
**> The musical composition is segmented into so-called 'pitch-class
**> segments' and these segments are compared with one reference set
**> with a
**> distance function. Only some distance values are possible. These
**> distance values can be averaged over music bars which produces
**> smoother
**> distribution and the 'comparison curve' that illustrates the distances
**> according to the reference set through a musical piece result in more
**> readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ),
**> but I
**> would prefer to use original values.
**>
**> then, I want to pick only some regions from the piece and compare
**> those
**> values of those regions, whether they are higher than the mean of all
**> values.
**>
**> Atte
**>
**> On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:
**>
**> Is there anything for me?
**>
**> There is a lot of data, n=2418, but there are also a lot of ties.
**> My sample nÅ250-300
**>
**>
**> I do not understand why there should be so many ties. You have not
**> described the measurement process or units. ( ... although you offer
**> a
**>
**> glipmse without much background later.)
**>
**> i would like to test, whether the mean of the sample differ
**> significantly from the population mean.
**>
**> Why? What is the purpose of this investigation? Why should the mean
**> of
**>
**> a sample be that important?
**>
**>
**> The histogram of the population looks like in attached histogram,
**> what test should I use? No choices?
**>
**> This distribution comes from a musical piece and the values are
**> 'tonal distances'.
**>
**> http://users.utu.fi/attenka/Hist.png
**>
**> That picture does not offer much insidght into the features of that
**> measurement. It appears to have much more structure than I would
**> expect for a sample from a smooth unimodal underlying population.
**>
**> --
**> David.
**>
**>
**> Atte
**>
**> On 06/24/2010 12:40 PM, David Winsemius wrote:
**>
**> On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
**>
**> Thanks. What I have had to ask is that
**>
**> how do you test that the data is symmetric enough?
**> If it is not, is it ok to use some data transformation?
**>
**> when it is said:
**>
**> "The Wilcoxon signed rank test does not assume that the data are
**> sampled from a Gaussian distribution. However it does assume
**> that
**>
**> the
**> data are distributed symmetrically around the median. If the
**> distribution is asymmetrical, the P value will not tell you much
**>
**> about
**> whether the median is different than the hypothetical value."
**>
**> You are being misled. Simply finding a statement on a statistics
**> software website, even one as reputable as Graphpad (???), does
**> not
**> mean
**> that it is necessarily true. My understanding (confirmed
**> reviewing
**> "Nonparametric statistical methods for complete and censored
**> data"
**> by M.
**> M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank
**> test
**> does
**> not require that the underlying distributions be symmetric. The
**> above
**> quotation is highly inaccurate.
**>
**>
**> To add to what David and others have said, look at the kernel that
**>
**> the
**>
**> U-statistic associated with the WSR test uses: the indicator (0/1)
**> of
**> xi
**> + xj > 0. So WSR tests H0:p=0.5 where p = the probability that
**> the
**> average of a randomly chosen pair of values is positive. [If
**> there
**> are
**> ties this probably needs to be worded as P[xi + xj > 0] = P[xi +
**> xj
**> <
**>
**> 0], i neq j.
**>
**> Frank
**>
**> --
**> Frank E Harrell Jr Professor and Chairman School of
**> Medicine
**> Department of Biostatistics Vanderbilt
**> University
**>
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide http://www.R-project.org/posting-
**> guide.html
**> and provide commented, minimal, self-contained, reproducible code.
**>
**>
*

