Re: [R] Which distribution best fits the data?

From: Thomas Adams <Thomas.Adams_at_noaa.gov>
Date: Mon, 30 Jun 2008 10:46:36 -0400

Jenny,

You may try here: http://en.wikipedia.org/wiki/Normality_test which mentions the R package nortest

and here;

The Probability Plot Correlation Coefficient Test for Normality, James J. Filliben:

http://www.jstor.org/sici?sici=0040-1706(197502)17%3A1%3C111%3ATPPCCT%3E2.0.CO%3B2-6&cookieSet=1
http://www.minitab.com/resources/articles/normprob.pdf
http://engineering.tufts.edu/cee/people/vogel/publications/probability1986.pdf

Regards,
Tom

Jenny Barnes wrote:
> Hi Ben and R-help communtiy,
>
> More specifics:
>
> I am using sea-surface temperature (averaged over an area) and also
> winds (averaged over an area) to use in a linear regression model as
> predictors for rainfall over a small region of Africa. So I have 1
> time series of sea-temp and one timeseries of rainfall (over 36 years
> - seasonal average) and I have performed the linear regression between
> the 2. I now want to check if the residuals are normally distributed.
> If they are not I want an R function that will tell me what
> distribution they are most similar to - so that I can apply a suitable
> transformation to make the data normal.....
>
> Any more tips now that you have a few more details perhaps? :o)
>
> Thanks for your time,
>
> Jenny
>
> On Mon, 30 Jun 2008, Ben Bolker wrote:
>
>> Jenny Barnes <jmb <at> mssl.ucl.ac.uk> writes:
>>
>>>
>>> Dear R-help community,
>>>
>>> Does anybody know of a stats function in R that tells you which
>>> distribution best fits your data? I have tried look through the
>>> archives
>>> but have only found functions that tell you if it's normal or log etc.
>>> specifically - I am looking for a function that tells you (given a
>>> timeseries) what the distribution is.
>>>
>>> Any help/advice will be greatly appreciated,
>>>
>>> All the best,
>>>
>>> Jenny Barnes
>>>
>>> jmb <at> mssl.ucl.ac.uk
>>
>> The problem is that it's not generally a good
>> idea to data-dredge in this way. Your best bet is
>> to think about the characteristics of the
>> data (discrete or continuous, non-negative or real,
>> symmetric or skewed) and try to narrow it down to
>> a few distributions -- then you can use fitdistr()
>> (from the MASS package) or something similar
>> to compare among them.
>>
>> If you say a little bit more about what
>> you're trying to do with the data you might
>> get some more specific advice.
>>
>> Ben Bolker
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Thomas E Adams
National Weather Service
Ohio River Forecast Center
1901 South State Route 134
Wilmington, OH 45177

EMAIL:	thomas.adams_at_noaa.gov

VOICE:	937-383-0528
FAX:	937-383-0033

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 30 Jun 2008 - 14:52:47 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 30 Jun 2008 - 15:30:50 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive