RE: [R] have to point it out again: a distribution question

From: bogdan romocea <br44114_at_gmail.com>
Date: Sat 30 Apr 2005 - 04:30:04 EST


> Then, Reid, or other r-gurus, is there a good way to descritize
> the sample into 3 category: 2 tails and the body?

Out of curiosity, how do you plan to use that information? What would you do if you knew that the 'body' starts here and ends there?

-----Original Message-----
From: WeiWei Shi [mailto:helprhelp@gmail.com] Sent: Thursday, April 28, 2005 4:18 PM
To: Huntsinger, Reid
Cc: R-help@stat.math.ethz.ch
Subject: Re: [R] have to point it out again: a distribution question

Here is summary of
l<-qqnorm(kk) # kk is my sample
l$y (which is my sample)
l$x (which is therotical quantile)
diff<-l$y-l$x

and
> summary(l$y)

   Min. 1st Qu. Median Mean 3rd Qu. Max.  0.9007 0.9942 0.9998 0.9999 1.0060 1.1070
> summary(l$x)

      Min. 1st Qu. Median Mean 3rd Qu. Max. -4.145e+00 -6.745e-01 0.000e+00 2.383e-17 6.745e-01 4.145e+00
> summary(diff)

   Min. 1st Qu. Median Mean 3rd Qu. Max. -3.0380 0.3311 0.9998 0.9999 1.6690 5.0460

Comparing diff with l$x, though the 1st Qu. and 3rd Qu. are different, diff and l$x seem similar to each other, which are proved by qqnorm(l$x) and qqnorm(diff).

running the following codes:

r<-rnorm(1000)+1 # since my sample shift from zero to 1 qq(r[r>0.9 & r<1.2]) # select the central part

this gives me a straight line now.

Thanks for the good explanation for the phenomena.

Then, Reid, or other r-gurus, is there a good way to descritize the sample into 3 category: 2 tails and the body?

Thanks again,

Weiwei

On 4/28/05, Huntsinger, Reid <reid_huntsinger@merck.com> wrote:
> Stock returns and other financial data have often found to be heavy-tailed.
> Even Cauchy distributions (without even a first absolute moment) have been
> entertained as models.
>
> Your qq function subtracts numbers on the scale of a normal (0,1)
> distribution from the input data. When the input data are scaled so that
> they are insignificant compared to 1, say, then you get essentially the
> "theoretical quantiles" ie the "x" component of the list back from l$x -
> l$y. l$x is basically a sample from a normal(0,1) distribution so they do
> line up perfectly in the second qqnorm(). Is that what's happening?
>
> Reid Huntsinger
>
>
> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of WeiWei Shi
> Sent: Thursday, April 28, 2005 1:38 PM
> To: Vincent ZOONEKYND
> Cc: R-help@stat.math.ethz.ch
> Subject: [R] have to point it out again: a distribution question
>
> Dear R-helpers:
> I pointed out my question last time but it is only partially solved.
> So I would like to point it out again since I think it is very
> interesting, at least to me.
> It is a question not about how to use R, instead it is a kind of
> therotical plus practical question, represented by R.
>
> I came with this question when I built model for some stock returns.
> That's the reason I cannot post the complete data here. But I would
> like to attach some plots here (I zipped them since the original ones
> are too big).
>
> The first plot qq1, is qqnorm plot of my sample, giving me some
> "S"-shape. Since I am not very experienced, I am not sure what kind of
> distribution my sample follows.
>
> The second plot, qq2, is obtained via
> qqnorm(rt(10000, 4)) since I run
> fitdistr(kk, 't') and got
> m s df
> 9.998789e-01 7.663799e-03 3.759726e+00
> (5.332631e-05) (5.411400e-05) (8.684956e-02)
>
> The second plot seems to say my sample distr follows t-distr. (not sure of
> this)
>
> BTW, what the commands for simulating other distr like log-norm,
> exponential, and so on?
>
> The third one was obtained by running the following R code:
>
> Suppose my data is read into dataset k from file "f392.txt":
> k<-read.table("f392.txt", header=F) # read into k
> kk<-k[[1]]
> qq(kk)
>
> qq function is defined as below:
> qq<-function(dataset){
> l<-qqnorm(dataset, plot.it=F)
> diff<-l$y-l$x # difference b/w sample and it's therotical quantile
> qqnorm(diff)
> }
>
> The most interesting thing is (if there is not any stupid game here,
> and if my sample follows some kind of distribution (no matter if such
> distr has been found or not)), my qq function seems like a way to
> evaluate it. But what I am worried about, the line is too "perfect",
> which indiates there is something goofy here, which can be proved via
> some mathematical inference to get it. However I used
> qq(rnorm(10000))
> qq(rt(10000, 3.7)
> qq(rf(....))
>
> None of them gave me this perfect line!
>
> Sorry for the long question but I want to make it clear to everybody
> about my question. I tried my best :)
>
> Thanks for your reading,
>
> Weiwei (Ed) Shi, Ph.D
>
> On 4/23/05, Vincent ZOONEKYND <zoonek@gmail.com> wrote:
> > If I understand your problem, you are computing the difference between
> > your data and the quantiles of a standard gaussian variable -- in
> > other words, the difference between the data and the red line, in the
> > following picture.
> >
> > N <- 100 # Sample size
> > m <- 1 # Mean
> > s <- 2 # dispersion
> > x <- m + s * rt(N, df=2) # Non-gaussian data
> >
> > qqnorm(x)
> > abline(0,1, col="red")
> >
> > And you get
> >
> > y <- sort(x) - qnorm(ppoints(N))
> > hist(y)
> >
> > This is probably not the right line (not only because your mean is 1,
> > the slope is wrong as well -- if the data were gaussian, you could
> > estimate it with the standard deviation).
> >
> > You can use the "qqline" function to get the line passing throught the
> > first and third quartiles, which is probably closer to what you have
> > in mind.
> >
> > qqnorm(x)
> > abline(0,1, col="red")
> > qqline(x, col="blue")
> >
> > The differences are
> >
> > x1 <- quantile(x, .25)
> > x2 <- quantile(x, .75)
> > b <- (x2-x1) / (qnorm(.75)-qnorm(.25))
> > a <- x1 - b * qnorm(.25)
> > y <- sort(x) - (a + b * qnorm(ppoints(N)))
> > hist(y)
> >
> > And you want to know when the differences ceases to be "significantly"
> > different from zero.
> >
> > plot(y)
> > abline(h=0, lty=3)
> >
> > You can use the plot fo fix a threshold, but unless you have a model
> > describing how non-gaussian you data are, this will be empirical.
> >
> > You will note that, in those simulations, the differences (either
> > yours or those from the lines through the first and third quartiles)
> > are not gaussian at all.
> >
> > -- Vincent
> >
> >
> > On 4/22/05, WeiWei Shi <helprhelp@gmail.com> wrote:
> > > hope it is not b/c some central limit therory, otherwise my initial
> > > plan will fail :)
> > >
> > > On 4/22/05, WeiWei Shi <helprhelp@gmail.com> wrote:
> > > > Hi, r-gurus:
> > > >
> > > > I happened to have a question in my work:
> > > >
> > > > I have a dataset, which has only one dimention, like
> > > > 0.99037297527605
> > > > 0.991179836732708
> > > > 0.995635340631367
> > > > 0.997186769599305
> > > > 0.991632565640424
> > > > 0.984047197106486
> > > > 0.99225943762649
> > > > 1.00555642128421
> > > > 0.993725402926564
> > > > ....
> > > >
> > > > the data is saved in a file called f392.txt.
> > > >
> > > > I used the following codes to play around :)
> > > >
> > > > k<-read.table("f392.txt", header=F) # read into k
> > > > kk<-k[[1]]
> > > > l<-qqnorm(kk)
> > > > diff=c()
> > > > lenk<-length(kk)
> > > > i=1
> > > > while (i<=lenk){
> > > > diff[i]=l$y[i]-l$x[i] # save the difference of therotical quantile
> > > > and sample quantile
> > > > # remember, my sample mean is around 1
> > > > while the therotical one, 0
> > > > i<-i+1
> > > > }
> > > > hist(diff, breaks=300) # analyze the distr of such diff
> > > > qqnorm(diff)
> > > >
> > > > my question is:
> > > > from l<-qqnorm(kk), I wanted to know, from which point (or cut), the

> > > > sample points start to become away from therotical ones. That's the
> > > > reason I played around the "diff" list, which gives me the difference.
> > > > To my surprise, the diff is perfectly normal. I tried to use some
> > > > kk<-c(1, 2, 5, -1 , ...) to test, I concluded it must be some
> > > > distribution my sample follows gives this finding.
> > > >
> > > > So, any suggestion on the distribution of my sample? I think there
> > > > might be some mathematical inference which can leads this observation,
> > > > but not quite sure.
> > > >
> > > > btw,
> > > > > fitdistr(kk, 't')
> > > > m s df
> > > > 9.999965e-01 7.630770e-03 3.742244e+00
> > > > (5.317674e-05) (5.373884e-05) (8.584725e-02)
> > > >
> > > > btw2, can anyone suggest a way to find the "cut" or "threshold" from
> > > > my sample to discretize them into 3 groups: two tail-group and one
> > > > main group.--------- my focus.
> > > >

> > > > Thanks,
> > > >
> > > > Ed
> > > >
> > >
> > > ______________________________________________
> > > R-help@stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide!
> > > http://www.R-project.org/posting-guide.html
> > >
> >
>
> ------------------------------------------------------------------------------
> Notice: This e-mail message, together with any attachment...{{dropped}}



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Apr 30 04:35:34 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:31 EST