From: WeiWei Shi <helprhelp_at_gmail.com>

Date: Fri 29 Apr 2005 - 06:18:14 EST

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Apr 29 06:22:09 2005

Date: Fri 29 Apr 2005 - 06:18:14 EST

Here is summary of

l<-qqnorm(kk) # kk is my sample

l$y (which is my sample)

l$x (which is therotical quantile)

diff<-l$y-l$x

and

> summary(l$y)

Min. 1st Qu. Median Mean 3rd Qu. Max.
0.9007 0.9942 0.9998 0.9999 1.0060 1.1070

> summary(l$x)

Min. 1st Qu. Median Mean 3rd Qu. Max.
-4.145e+00 -6.745e-01 0.000e+00 2.383e-17 6.745e-01 4.145e+00

> summary(diff)

Min. 1st Qu. Median Mean 3rd Qu. Max. -3.0380 0.3311 0.9998 0.9999 1.6690 5.0460

Comparing diff with l$x, though the 1st Qu. and 3rd Qu. are different, diff and l$x seem similar to each other, which are proved by qqnorm(l$x) and qqnorm(diff).

running the following codes:

r<-rnorm(1000)+1 # since my sample shift from zero to 1 qq(r[r>0.9 & r<1.2]) # select the central part

this gives me a straight line now.

Thanks for the good explanation for the phenomena.

Thanks again,

Weiwei

On 4/28/05, Huntsinger, Reid <reid_huntsinger@merck.com> wrote:

> Stock returns and other financial data have often found to be heavy-tailed.

*> Even Cauchy distributions (without even a first absolute moment) have been
**> entertained as models.
**>
**> Your qq function subtracts numbers on the scale of a normal (0,1)
**> distribution from the input data. When the input data are scaled so that
**> they are insignificant compared to 1, say, then you get essentially the
**> "theoretical quantiles" ie the "x" component of the list back from l$x -
**> l$y. l$x is basically a sample from a normal(0,1) distribution so they do
**> line up perfectly in the second qqnorm(). Is that what's happening?
**>
**> Reid Huntsinger
**>
**>
**> -----Original Message-----
**> From: r-help-bounces@stat.math.ethz.ch
**> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of WeiWei Shi
**> Sent: Thursday, April 28, 2005 1:38 PM
**> To: Vincent ZOONEKYND
**> Cc: R-help@stat.math.ethz.ch
**> Subject: [R] have to point it out again: a distribution question
**>
**> Dear R-helpers:
**> I pointed out my question last time but it is only partially solved.
**> So I would like to point it out again since I think it is very
**> interesting, at least to me.
**> It is a question not about how to use R, instead it is a kind of
**> therotical plus practical question, represented by R.
**>
**> I came with this question when I built model for some stock returns.
**> That's the reason I cannot post the complete data here. But I would
**> like to attach some plots here (I zipped them since the original ones
**> are too big).
**>
**> The first plot qq1, is qqnorm plot of my sample, giving me some
**> "S"-shape. Since I am not very experienced, I am not sure what kind of
**> distribution my sample follows.
**>
**> The second plot, qq2, is obtained via
**> qqnorm(rt(10000, 4)) since I run
**> fitdistr(kk, 't') and got
**> m s df
**> 9.998789e-01 7.663799e-03 3.759726e+00
**> (5.332631e-05) (5.411400e-05) (8.684956e-02)
**>
**> The second plot seems to say my sample distr follows t-distr. (not sure of
**> this)
**>
**> BTW, what the commands for simulating other distr like log-norm,
**> exponential, and so on?
**>
**> The third one was obtained by running the following R code:
**>
**> Suppose my data is read into dataset k from file "f392.txt":
**> k<-read.table("f392.txt", header=F) # read into k
**> kk<-k[[1]]
**> qq(kk)
**>
**> qq function is defined as below:
**> qq<-function(dataset){
**> l<-qqnorm(dataset, plot.it=F)
**> diff<-l$y-l$x # difference b/w sample and it's therotical quantile
**> qqnorm(diff)
**> }
**>
**> The most interesting thing is (if there is not any stupid game here,
**> and if my sample follows some kind of distribution (no matter if such
**> distr has been found or not)), my qq function seems like a way to
**> evaluate it. But what I am worried about, the line is too "perfect",
**> which indiates there is something goofy here, which can be proved via
**> some mathematical inference to get it. However I used
**> qq(rnorm(10000))
**> qq(rt(10000, 3.7)
**> qq(rf(....))
**>
**> None of them gave me this perfect line!
**>
**> Sorry for the long question but I want to make it clear to everybody
**> about my question. I tried my best :)
**>
**> Thanks for your reading,
**>
**> Weiwei (Ed) Shi, Ph.D
**>
**> On 4/23/05, Vincent ZOONEKYND <zoonek@gmail.com> wrote:
**> > If I understand your problem, you are computing the difference between
**> > your data and the quantiles of a standard gaussian variable -- in
**> > other words, the difference between the data and the red line, in the
**> > following picture.
**> >
**> > N <- 100 # Sample size
**> > m <- 1 # Mean
**> > s <- 2 # dispersion
**> > x <- m + s * rt(N, df=2) # Non-gaussian data
**> >
**> > qqnorm(x)
**> > abline(0,1, col="red")
**> >
**> > And you get
**> >
**> > y <- sort(x) - qnorm(ppoints(N))
**> > hist(y)
**> >
**> > This is probably not the right line (not only because your mean is 1,
**> > the slope is wrong as well -- if the data were gaussian, you could
**> > estimate it with the standard deviation).
**> >
**> > You can use the "qqline" function to get the line passing throught the
**> > first and third quartiles, which is probably closer to what you have
**> > in mind.
**> >
**> > qqnorm(x)
**> > abline(0,1, col="red")
**> > qqline(x, col="blue")
**> >
**> > The differences are
**> >
**> > x1 <- quantile(x, .25)
**> > x2 <- quantile(x, .75)
**> > b <- (x2-x1) / (qnorm(.75)-qnorm(.25))
**> > a <- x1 - b * qnorm(.25)
**> > y <- sort(x) - (a + b * qnorm(ppoints(N)))
**> > hist(y)
**> >
**> > And you want to know when the differences ceases to be "significantly"
**> > different from zero.
**> >
**> > plot(y)
**> > abline(h=0, lty=3)
**> >
**> > You can use the plot fo fix a threshold, but unless you have a model
**> > describing how non-gaussian you data are, this will be empirical.
**> >
**> > You will note that, in those simulations, the differences (either
**> > yours or those from the lines through the first and third quartiles)
**> > are not gaussian at all.
**> >
**> > -- Vincent
**> >
**> >
**> > On 4/22/05, WeiWei Shi <helprhelp@gmail.com> wrote:
**> > > hope it is not b/c some central limit therory, otherwise my initial
**> > > plan will fail :)
**> > >
**> > > On 4/22/05, WeiWei Shi <helprhelp@gmail.com> wrote:
**> > > > Hi, r-gurus:
**> > > >
**> > > > I happened to have a question in my work:
**> > > >
**> > > > I have a dataset, which has only one dimention, like
**> > > > 0.99037297527605
**> > > > 0.991179836732708
**> > > > 0.995635340631367
**> > > > 0.997186769599305
**> > > > 0.991632565640424
**> > > > 0.984047197106486
**> > > > 0.99225943762649
**> > > > 1.00555642128421
**> > > > 0.993725402926564
**> > > > ....
**> > > >
**> > > > the data is saved in a file called f392.txt.
**> > > >
**> > > > I used the following codes to play around :)
**> > > >
**> > > > k<-read.table("f392.txt", header=F) # read into k
**> > > > kk<-k[[1]]
**> > > > l<-qqnorm(kk)
**> > > > diff=c()
**> > > > lenk<-length(kk)
**> > > > i=1
**> > > > while (i<=lenk){
**> > > > diff[i]=l$y[i]-l$x[i] # save the difference of therotical quantile
**> > > > and sample quantile
**> > > > # remember, my sample mean is around 1
**> > > > while the therotical one, 0
**> > > > i<-i+1
**> > > > }
**> > > > hist(diff, breaks=300) # analyze the distr of such diff
**> > > > qqnorm(diff)
**> > > >
**> > > > my question is:
**> > > > from l<-qqnorm(kk), I wanted to know, from which point (or cut), the
**> > > > sample points start to become away from therotical ones. That's the
**> > > > reason I played around the "diff" list, which gives me the difference.
**> > > > To my surprise, the diff is perfectly normal. I tried to use some
**> > > > kk<-c(1, 2, 5, -1 , ...) to test, I concluded it must be some
**> > > > distribution my sample follows gives this finding.
**> > > >
**> > > > So, any suggestion on the distribution of my sample? I think there
**> > > > might be some mathematical inference which can leads this observation,
**> > > > but not quite sure.
**> > > >
**> > > > btw,
**> > > > > fitdistr(kk, 't')
**> > > > m s df
**> > > > 9.999965e-01 7.630770e-03 3.742244e+00
**> > > > (5.317674e-05) (5.373884e-05) (8.584725e-02)
**> > > >
**> > > > btw2, can anyone suggest a way to find the "cut" or "threshold" from
**> > > > my sample to discretize them into 3 groups: two tail-group and one
**> > > > main group.--------- my focus.
**> > > >
**> > > > Thanks,
**> > > >
**> > > > Ed
**> > > >
**> > >
**> > > ______________________________________________
**> > > R-help@stat.math.ethz.ch mailing list
**> > > https://stat.ethz.ch/mailman/listinfo/r-help
**> > > PLEASE do read the posting guide!
**> > > http://www.R-project.org/posting-guide.html
**> > >
**> >
**>
**> ------------------------------------------------------------------------------
**> Notice: This e-mail message, together with any attachment...{{dropped}}
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Apr 29 06:22:09 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:31:28 EST
*