Re: [R] MASS fitdistr with plyr or data.table?

From: Dennis Murphy <djmuser_at_gmail.com>
Date: Wed, 27 Apr 2011 16:31:22 -0700

Hi:

Here's one way to do this with plyr and data.table.

# plyr
As Hadley inferred, when using ddply(), it's convenient to write a function for a generic (sub-)data frame and have it return a data frame. Here's my function and call:

f <- function(d) {

     require(MASS)
     est <- fitdistr(d$wind_speed, 'weibull')$estimate
     data.frame(shape = est[1], scale = est[2])
  }

Notice that f() takes a data frame d as input, uses the wind_speed component of d to fit the Weibull, and returns a data frame with names for the estimates.

> ddply(weib.test.too, 'site', f)

   site shape scale

1     1 3.063853 8.049467
2     2 2.945982 8.067252
3     3 2.879392 7.999636
4     4 3.097191 8.084453
5     5 3.091117 8.012450
6     6 2.943254 7.912792
7     7 2.957455 7.947545
8     8 2.975732 7.901587
9     9 3.045563 8.061838
10   10 2.995324 8.056820

Warning messages:
1: In dweibull(x, shape, scale, log) : NaNs produced 2: In dweibull(x, shape, scale, log) : NaNs produced <snip the other eight>

## data.table

In writing a function for data.table, you want a variable as input and a list as output. We therefore modify the above function slightly:

g <- function(x) {

     require(MASS)
     est <- fitdistr(x, 'weibull')$estimate
     list(shape = est[1], scale = est[2])
  }

library(data.table)
weib.test.dt <- data.table(weib.test.too, key = 'site')
> weib.test.dt[, g(wind_speed), by = site]

      site shape scale

 [1,]    1 3.063853 8.049467
 [2,]    2 2.945982 8.067252
 [3,]    3 2.879392 7.999636
 [4,]    4 3.097191 8.084453
 [5,]    5 3.091117 8.012450
 [6,]    6 2.943254 7.912792
 [7,]    7 2.957455 7.947545
 [8,]    8 2.975732 7.901587
 [9,]    9 3.045563 8.061838

[10,] 10 2.995324 8.056820
## <warnings snipped>

HTH,
Dennis

On Wed, Apr 27, 2011 at 1:55 PM, Justin Haynes <jtor14_at_gmail.com> wrote:
> I am trying to extract the shape and scale parameters of a wind speed
> distribution for different sites.  I can do this in a clunky way, but
> I was hoping to find a way using data.table or plyr.  However, when I
> try I am met with the following:

>

> set.seed(144)
> weib.dist<-rweibull(10000,shape=3,scale=8)
> weib.test<-data.table(cbind(1:10,weib.dist))
> names(weib.test)<-c('site','wind_speed')
>

> fitted<-weib.test[,fitdistr(wind_speed,'weibull'),by=site]
>

> Error in class(ans[[length(byval) + jj]]) = class(testj[[jj]]) :
>  invalid to set the class to matrix unless the dimension attribute is
> of length 2 (was 0)
> In addition: Warning messages:
> 1: In dweibull(x, shape, scale, log) : NaNs produced
> ...
> 10: In dweibull(x, shape, scale, log) : NaNs produced
>

> (the warning messages are normal from what I can tell)
>

> or using plyr:
>

> set.seed(144)
> weib.dist<-rweibull(10000,shape=3,scale=8)
> weib.test.too<-data.frame(cbind(1:10,weib.dist))
> names(weib.test.too)<-c('site','wind_speed')
>

> fitted<-ddply(weib.test.too,.(site),fitdistr,'weibull')
>

> Error in .fun(piece, ...) : 'x' must be a non-empty numeric vector
>

> those sound like similar errors to me, but I can't figure out how to
> make them go away!
>

> to prove I'm not crazy:
>

> fitdistr(weib.dist,'weibull')$estimate
>   shape    scale
> 2.996815 8.009757
> Warning messages:
> 1: In dweibull(x, shape, scale, log) : NaNs produced
> 2: In dweibull(x, shape, scale, log) : NaNs produced
> 3: In dweibull(x, shape, scale, log) : NaNs produced
> 4: In dweibull(x, shape, scale, log) : NaNs produced
>

> Thanks
>

> Justin
>

> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 27 Apr 2011 - 23:35:21 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 28 Apr 2011 - 01:50:33 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive