From: Atte Tenkanen <attenka_at_utu.fi>

Date: Sat, 26 Jun 2010 07:08:22 +0300

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 26 Jun 2010 - 09:48:44 GMT

Date: Sat, 26 Jun 2010 07:08:22 +0300

Atte Tenkanen kirjoitti 26.6.2010 kello 5.15:

*>
*

> Greg Snow kirjoitti 25.6.2010 kello 21.55:

*>
**>> Let me see if I understand. You actually have the data for the
**>> whole population (the entire piece) but you have some pre-defined
**>> sections that you want to see if they differ from the population,
**>> or more meaningfully they are different from a randomly selected
**>> set of measures. Is that correct?
**>>
**>> If so, since you have the entire population of interest you can
**>> create the actual sampling distribution (or a good approximation
**>> of it). Just take random samples from the population of the given
**>> size (matching the subset you are interested in) and calculate the
**>> means (or other value of interest), probably 10,000 to 1,000,000
**>> samples. Now compare the value from your predefined subset to the
**>> set of random values you generated to see if it is in the tail or
**>> not.
**>
**> I check, so you mean doing it this way:
**>
**> t.test(sample(POPUL, length(SAMPLE), replace = FALSE), mu=mean
**> (SAMPLE), alt = "less")
*

NO, this way:

t.test(POPUL[sample(1:length(POPUL), length(SAMPLE), replace = FALSE)], mu=mean(SAMPLE), alt = "less")

Atte

*>
**> Atte
**>
**>>
*

>> --

*>> Gregory (Greg) L. Snow Ph.D.
**>> Statistical Data Center
**>> Intermountain Healthcare
**>> greg.snow_at_imail.org
**>> 801.408.8111
**>>
**>>
**>>> -----Original Message-----
**>>> From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-
**>>> project.org] On Behalf Of Atte Tenkanen
**>>> Sent: Thursday, June 24, 2010 11:04 PM
**>>> To: David Winsemius
**>>> Cc: R mailing list
**>>> Subject: Re: [R] Wilcoxon signed rank test and its requirements
**>>>
**>>> The values come from this kind of process:
**>>> The musical composition is segmented into so-called 'pitch-class
**>>> segments' and these segments are compared with one reference set
**>>> with a
**>>> distance function. Only some distance values are possible. These
**>>> distance values can be averaged over music bars which produces
**>>> smoother
**>>> distribution and the 'comparison curve' that illustrates the
**>>> distances
**>>> according to the reference set through a musical piece result in
**>>> more
**>>> readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ),
**>>> but I
**>>> would prefer to use original values.
**>>>
**>>> then, I want to pick only some regions from the piece and compare
**>>> those
**>>> values of those regions, whether they are higher than the mean of
**>>> all
**>>> values.
**>>>
**>>> Atte
**>>>
**>>>> On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:
**>>>>
**>>>>> Is there anything for me?
**>>>>>
**>>>>> There is a lot of data, n=2418, but there are also a lot of ties.
**>>>>> My sample nÅ250-300
**>>>>>
**>>>>
**>>>> I do not understand why there should be so many ties. You have not
**>>>> described the measurement process or units. ( ... although you
**>>>> offer
**>>> a
**>>>>
**>>>> glipmse without much background later.)
**>>>>
**>>>>> i would like to test, whether the mean of the sample differ
**>>>>> significantly from the population mean.
**>>>>
**>>>> Why? What is the purpose of this investigation? Why should the mean
**>>> of
**>>>>
**>>>> a sample be that important?
**>>>>
**>>>>>
**>>>>> The histogram of the population looks like in attached histogram,
**>>>>> what test should I use? No choices?
**>>>>>
**>>>>> This distribution comes from a musical piece and the values are
**>>>>> 'tonal distances'.
**>>>>>
**>>>>> http://users.utu.fi/attenka/Hist.png
**>>>>
**>>>> That picture does not offer much insidght into the features of that
**>>>> measurement. It appears to have much more structure than I would
**>>>> expect for a sample from a smooth unimodal underlying population.
**>>>>
**>>>> --
**>>>> David.
**>>>>
**>>>>>
**>>>>> Atte
**>>>>>
**>>>>>> On 06/24/2010 12:40 PM, David Winsemius wrote:
**>>>>>>>
**>>>>>>> On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
**>>>>>>>
**>>>>>>>> Thanks. What I have had to ask is that
**>>>>>>>>
**>>>>>>>> how do you test that the data is symmetric enough?
**>>>>>>>> If it is not, is it ok to use some data transformation?
**>>>>>>>>
**>>>>>>>> when it is said:
**>>>>>>>>
**>>>>>>>> "The Wilcoxon signed rank test does not assume that the data
**>>>>>>>> are
**>>>>>>>> sampled from a Gaussian distribution. However it does assume
**>>> that
**>>>>
**>>>>>>>> the
**>>>>>>>> data are distributed symmetrically around the median. If the
**>>>>>>>> distribution is asymmetrical, the P value will not tell you
**>>>>>>>> much
**>>>>
**>>>>>>>> about
**>>>>>>>> whether the median is different than the hypothetical value."
**>>>>>>>
**>>>>>>> You are being misled. Simply finding a statement on a statistics
**>>>>>>> software website, even one as reputable as Graphpad (???), does
**>>> not
**>>>>>> mean
**>>>>>>> that it is necessarily true. My understanding (confirmed
**>>> reviewing
**>>>>>>> "Nonparametric statistical methods for complete and censored
**>>> data"
**>>>>>> by M.
**>>>>>>> M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank
**>>> test
**>>>>>> does
**>>>>>>> not require that the underlying distributions be symmetric. The
**>>>>>>> above
**>>>>>>> quotation is highly inaccurate.
**>>>>>>>
**>>>>>>
**>>>>>> To add to what David and others have said, look at the kernel
**>>>>>> that
**>>>>
**>>>>>> the
**>>>>>>
**>>>>>> U-statistic associated with the WSR test uses: the indicator
**>>>>>> (0/1)
**>>>> of
**>>>>>> xi
**>>>>>> + xj > 0. So WSR tests H0:p=0.5 where p = the probability that
**>>> the
**>>>>>> average of a randomly chosen pair of values is positive. [If
**>>> there
**>>>>>> are
**>>>>>> ties this probably needs to be worded as P[xi + xj > 0] = P[xi +
**>>> xj
**>>>> <
**>>>>>>
**>>>>>> 0], i neq j.
**>>>>>>
**>>>>>> Frank
**>>>>>>
**>>>>>> --
**>>>>>> Frank E Harrell Jr Professor and Chairman School of
**>>> Medicine
**>>>>>> Department of Biostatistics Vanderbilt
**>>>>>> University
**>>>>
**>>>
**>>> ______________________________________________
**>>> R-help_at_r-project.org mailing list
**>>> https://stat.ethz.ch/mailman/listinfo/r-help
**>>> PLEASE do read the posting guide http://www.R-project.org/posting-
**>>> guide.html
**>>> and provide commented, minimal, self-contained, reproducible code.
**>
*

[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 26 Jun 2010 - 09:48:44 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Sat 26 Jun 2010 - 21:20:35 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*