Re: [R] Which distribution best fits the data?

From: Ben Bolker <bolker_at_ufl.edu>
Date: Mon, 30 Jun 2008 08:21:21 -0400

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

~ Much better.

~ If m is your linear regression model,

~ * boxcox(m) in the MASS package will look for a power (more or less) transformation to normalize the residuals -- see the book for more information

~ * plot(m) will produce plots including a Q-Q plot (testing normality) of the residuals

~ * don't forget to check for autocorrelation in the residuals (acf(residuals(m)))

~ Ben Bolker

Jenny Barnes wrote:
| Hi Ben and R-help communtiy,
|
| More specifics:
|
| I am using sea-surface temperature (averaged over an area) and also
| winds (averaged over an area) to use in a linear regression model as
| predictors for rainfall over a small region of Africa. So I have 1 time
| series of sea-temp and one timeseries of rainfall (over 36 years -
| seasonal average) and I have performed the linear regression between the
| 2. I now want to check if the residuals are normally distributed. If
| they are not I want an R function that will tell me what distribution
| they are most similar to - so that I can apply a suitable transformation
| to make the data normal.....
|
| Any more tips now that you have a few more details perhaps? :o)
|
| Thanks for your time,
|
| Jenny
|
| On Mon, 30 Jun 2008, Ben Bolker wrote:
|
|> Jenny Barnes <jmb <at> mssl.ucl.ac.uk> writes:
|>
|>>
|>> Dear R-help community,
|>>
|>> Does anybody know of a stats function in R that tells you which
|>> distribution best fits your data? I have tried look through the archives
|>> but have only found functions that tell you if it's normal or log etc.
|>> specifically - I am looking for a function that tells you (given a
|>> timeseries) what the distribution is.
|>>
|>> Any help/advice will be greatly appreciated,
|>>
|>> All the best,
|>>
|>> Jenny Barnes
|>>
|>> jmb <at> mssl.ucl.ac.uk
|>
|> The problem is that it's not generally a good
|> idea to data-dredge in this way. Your best bet is
|> to think about the characteristics of the
|> data (discrete or continuous, non-negative or real,
|> symmetric or skewed) and try to narrow it down to
|> a few distributions -- then you can use fitdistr()
|> (from the MASS package) or something similar
|> to compare among them.
|>
|> If you say a little bit more about what
|> you're trying to do with the data you might
|> get some more specific advice.
|>
|> Ben Bolker
|>
|> ______________________________________________
|> R-help_at_r-project.org mailing list
|> https://stat.ethz.ch/mailman/listinfo/r-help
|> PLEASE do read the posting guide
|> http://www.R-project.org/posting-guide.html
|> and provide commented, minimal, self-contained, reproducible code.
|>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIaM/Bc5UpGjwzenMRAitOAJ4qa01aXSjVyBupzBUuf0x8o/47iwCeKuno VElg6gIT01qCPvWmELvm63Y=
=7cue
-----END PGP SIGNATURE-----



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 30 Jun 2008 - 12:31:19 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 30 Jun 2008 - 13:31:53 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive