Re: [R] randomForest error

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Fri 01 Jul 2005 - 00:20:08 EST


The limitation comes from the way categorical splits are represented in the code: For a categorical variable with k categories, the split is represented by k binary digits: 0=right, 1=left. So it takes k bits to store each split on k categories. To save storage, this is `packed' into a 4-byte integer (32-bit), thus the limit of 32 categories.

The current Fortran code (version 5.x) by Breiman and Cutler gets around this limitation by storing the split in an integer array. While this lifts the 32-category limit, it takes much more memory to store the splits. I'm still trying to figure out a more memory efficient way of storing the splits without imposing the 32-category limit. If anyone has suggestions, I'm all ears.

Best,
Andy

> From: Arne.Muller@sanofi-aventis.com
>
> Hello,
>
> I'm using the random forest package. One of my factors in the
> data set contains 41 levels (I can't code this as a numeric
> value - in terms of linear models this would be a random
> factor). The randomForest call comes back with an error
> telling me that the limit is 32 categories.
>
> Is there any reason for this particular limit? Maybe it's
> possible to recompile the module with a different cutoff?
>
> thanks a lot for your help,
> kind regards,
>
>
> Arne
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Jul 01 00:23:41 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:08 EST