Re: [R] Which distribution best fits the data?

From: Matthieu Stigler <Matthieu.Stigler_at_gmail.com>
Date: Tue, 01 Jul 2008 13:25:03 +0200


Hello

Regressions with time series model is something more complicate than usual, I recommend you to read more about it in any time series manual. The biggest problem comes from the so called potential spurious regression, that is your regression can lead to errnoneous conclusions (if you understand french, see the wikipedia page I wrote http://fr.wikipedia.org/wiki/R%e9gression_fallacieuse with R simulation examples).

In your case, you should actually rather test for stationnarity of all variables (not only residuals) to ensure that you results are correct. See packages urca and vars for this.

Hope this helps

Matthieu

> Jenny,
>
> You may try here: http://en.wikipedia.org/wiki/Normality_test which
> mentions the R package nortest
>
> and here;
>
> The Probability Plot Correlation Coefficient Test for Normality, James
> J. Filliben:
>
> http://www.jstor.org/sici?sici=0040-1706(197502)17%3A1%3C111%3ATPPCCT%3E2.0.CO%3B2-6&cookieSet=1
> http://www.minitab.com/resources/articles/normprob.pdf
> http://engineering.tufts.edu/cee/people/vogel/publications/probability1986.pdf
>
> Regards,
> Tom
>
> Jenny Barnes wrote:
> > Hi Ben and R-help communtiy,
> >
> > More specifics:
> >
> > I am using sea-surface temperature (averaged over an area) and also
> > winds (averaged over an area) to use in a linear regression model as
> > predictors for rainfall over a small region of Africa. So I have 1
> > time series of sea-temp and one timeseries of rainfall (over 36 years
> > - seasonal average) and I have performed the linear regression between
> > the 2. I now want to check if the residuals are normally distributed.
> > If they are not I want an R function that will tell me what
> > distribution they are most similar to - so that I can apply a suitable
> > transformation to make the data normal.....
> >
> > Any more tips now that you have a few more details perhaps? :o)
> >
> > Thanks for your time,
> >
> > Jenny
> >
> > On Mon, 30 Jun 2008, Ben Bolker wrote:
> >
> >> Jenny Barnes <jmb <at> mssl.ucl.ac.uk> writes:
> >>
> >>>
> >>> Dear R-help community,
> >>>
> >>> Does anybody know of a stats function in R that tells you which
> >>> distribution best fits your data? I have tried look through the
> >>> archives
> >>> but have only found functions that tell you if it's normal or log etc.
> >>> specifically - I am looking for a function that tells you (given a
> >>> timeseries) what the distribution is.
> >>>
> >>> Any help/advice will be greatly appreciated,
> >>>
> >>> All the best,
> >>>
> >>> Jenny Barnes
> >>>
> >>> jmb <at> mssl.ucl.ac.uk
> >>
> >> The problem is that it's not generally a good
> >> idea to data-dredge in this way. Your best bet is
> >> to think about the characteristics of the
> >> data (discrete or continuous, non-negative or real,
> >> symmetric or skewed) and try to narrow it down to
> >> a few distributions -- then you can use fitdistr()
> >> (from the MASS package) or something similar
> >> to compare among them.
> >>
> >> If you say a little bit more about what
> >> you're trying to do with the data you might
> >> get some more specific advice.
> >>
> >> Ben Bolker
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
> --
> Thomas E Adams
> National Weather Service
> Ohio River Forecast Center
> 1901 South State Route 134
> Wilmington, OH 45177
>
> EMAIL: thomas.adams at noaa.gov
>
> VOICE: 937-383-0528
> FAX: 937-383-0033
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 01 Jul 2008 - 11:29:19 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 01 Jul 2008 - 11:31:08 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive