Re: [R] randomForest

From: Duncan Murdoch <murdoch_at_stats.uwo.ca>
Date: Fri 08 Jul 2005 - 06:13:36 EST

On 7/7/2005 3:47 PM, Weiwei Shi wrote:

> it works.
> thanks,
> 
> but: (just curious)
> why i tried previously and i got
> 

>> is.vector(sample.size)
> [1] TRUE
> 
> i also tried as.vector(sample.size) and assigned it to sampsz,it still
> does not work.

Sorry, I used "vector" incorrectly. Lists are vectors. What sum needs is a numeric or complex vector, and lists are vectors of objects, not vectors of numbers.

You should use is.numeric(sample.size) to test whether you can sum sample.size.

Duncan Murdoch

> 
> On 7/7/05, Duncan Murdoch <murdoch@stats.uwo.ca> wrote:

>> On 7/7/2005 3:38 PM, Weiwei Shi wrote:
>> > Hi there:
>> > I have a question on random foresst:
>> >
>> > recently i helped a friend with her random forest and i came with this problem:
>> > her dataset has 6 classes and since the sample size is pretty small:
>> > 264 and the class distr is like this (Diag is the response variable)
>> > sample.size <- lapply(1:6, function(i) sum(Diag==i))
>> >> sample.size
>> > [[1]]
>> > [1] 36
>> >
>> > [[2]]
>> > [1] 12
>> >
>> > [[3]]
>> > [1] 120
>> >
>> > [[4]]
>> > [1] 36
>> >
>> > [[5]]
>> > [1] 30
>> >
>> > [[6]]
>> > [1] 30
>> >
>> > I assigned this sample.size to sampsz for a stratiefied sampling
>> > purpose and i got the following error:
>> > Error in sum(..., na.rm = na.rm) : invalid 'mode' of argument
>> >
>> > if I use sampsz=c(36, 12, 120, 36, 30, 30), then it is fine. Could you
>> > tell me why?
>>
>> The sum() function knows what to do on a vector, but not on a list. You
>> can turn your sample.size variable into a vector using
>>
>> unlist(sample.size)
>>
>> Duncan Murdoch
>>
>> > btw, as to classification problem for this with uneven class number
>> > situation, do u have some suggestions to improve its accuracy? I
>> > tried to use c() way to make the sampsz works but the result is
>> > similar.
>> >
>> > Thanks,
>> >
>> > weiwei
>> >
>> > On 6/30/05, Liaw, Andy <andy_liaw@merck.com> wrote:
>> >> The limitation comes from the way categorical splits are represented in the
>> >> code: For a categorical variable with k categories, the split is
>> >> represented by k binary digits: 0=right, 1=left. So it takes k bits to
>> >> store each split on k categories. To save storage, this is `packed' into a
>> >> 4-byte integer (32-bit), thus the limit of 32 categories.
>> >>
>> >> The current Fortran code (version 5.x) by Breiman and Cutler gets around
>> >> this limitation by storing the split in an integer array. While this lifts
>> >> the 32-category limit, it takes much more memory to store the splits. I'm
>> >> still trying to figure out a more memory efficient way of storing the splits
>> >> without imposing the 32-category limit. If anyone has suggestions, I'm all
>> >> ears.
>> >>
>> >> Best,
>> >> Andy
>> >>
>> >> > From: Arne.Muller@sanofi-aventis.com
>> >> >
>> >> > Hello,
>> >> >
>> >> > I'm using the random forest package. One of my factors in the
>> >> > data set contains 41 levels (I can't code this as a numeric
>> >> > value - in terms of linear models this would be a random
>> >> > factor). The randomForest call comes back with an error
>> >> > telling me that the limit is 32 categories.
>> >> >
>> >> > Is there any reason for this particular limit? Maybe it's
>> >> > possible to recompile the module with a different cutoff?
>> >> >
>> >> > thanks a lot for your help,
>> >> > kind regards,
>> >> >
>> >> >
>> >> > Arne
>> >> >
>> >> > ______________________________________________
>> >> > R-help@stat.math.ethz.ch mailing list
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide!
>> >> > http://www.R-project.org/posting-guide.html
>> >> >
>> >> >
>> >> >
>> >>
>> >> ______________________________________________
>> >> R-help@stat.math.ethz.ch mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>> >>
>> >
>> >
>>
>>
>
>

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Jul 08 06:20:54 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:21 EST