Re: [R] randomForest

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Fri 08 Jul 2005 - 06:10:32 EST


> From: Weiwei Shi
>
> it works.
> thanks,
>
> but: (just curious)
> why i tried previously and i got
>
> > is.vector(sample.size)
> [1] TRUE
Because a list is also a vector:

> a <- c(list(1), list(2))
> a
[[1]]
[1] 1

[[2]]
[1] 2

> is.vector(a)

[1] TRUE
> is.numeric(a)

[1] FALSE Actually, the way I initialize a list of known length is by something like:

myList <- vector(mode="list", length=veryLong)

Andy    

> i also tried as.vector(sample.size) and assigned it to sampsz,it still
> does not work.
>
> On 7/7/05, Duncan Murdoch <murdoch@stats.uwo.ca> wrote:
> > On 7/7/2005 3:38 PM, Weiwei Shi wrote:
> > > Hi there:
> > > I have a question on random foresst:
> > >
> > > recently i helped a friend with her random forest and i
> came with this problem:
> > > her dataset has 6 classes and since the sample size is
> pretty small:
> > > 264 and the class distr is like this (Diag is the
> response variable)
> > > sample.size <- lapply(1:6, function(i) sum(Diag==i))
> > >> sample.size
> > > [[1]]
> > > [1] 36
> > >
> > > [[2]]
> > > [1] 12
> > >
> > > [[3]]
> > > [1] 120
> > >
> > > [[4]]
> > > [1] 36
> > >
> > > [[5]]
> > > [1] 30
> > >
> > > [[6]]
> > > [1] 30
> > >
> > > I assigned this sample.size to sampsz for a stratiefied sampling
> > > purpose and i got the following error:
> > > Error in sum(..., na.rm = na.rm) : invalid 'mode' of argument
> > >
> > > if I use sampsz=c(36, 12, 120, 36, 30, 30), then it is
> fine. Could you
> > > tell me why?
> >
> > The sum() function knows what to do on a vector, but not on
> a list. You
> > can turn your sample.size variable into a vector using
> >
> > unlist(sample.size)
> >
> > Duncan Murdoch
> >
> > > btw, as to classification problem for this with uneven
> class number
> > > situation, do u have some suggestions to improve its accuracy? I
> > > tried to use c() way to make the sampsz works but the result is
> > > similar.
> > >
> > > Thanks,
> > >
> > > weiwei
> > >
> > > On 6/30/05, Liaw, Andy <andy_liaw@merck.com> wrote:
> > >> The limitation comes from the way categorical splits are
> represented in the
> > >> code: For a categorical variable with k categories, the split is
> > >> represented by k binary digits: 0=right, 1=left. So it
> takes k bits to
> > >> store each split on k categories. To save storage, this
> is `packed' into a
> > >> 4-byte integer (32-bit), thus the limit of 32 categories.
> > >>
> > >> The current Fortran code (version 5.x) by Breiman and
> Cutler gets around
> > >> this limitation by storing the split in an integer
> array. While this lifts
> > >> the 32-category limit, it takes much more memory to
> store the splits. I'm
> > >> still trying to figure out a more memory efficient way
> of storing the splits
> > >> without imposing the 32-category limit. If anyone has
> suggestions, I'm all
> > >> ears.
> > >>
> > >> Best,
> > >> Andy
> > >>
> > >> > From: Arne.Muller@sanofi-aventis.com
> > >> >
> > >> > Hello,
> > >> >
> > >> > I'm using the random forest package. One of my factors in the
> > >> > data set contains 41 levels (I can't code this as a numeric
> > >> > value - in terms of linear models this would be a random
> > >> > factor). The randomForest call comes back with an error
> > >> > telling me that the limit is 32 categories.
> > >> >
> > >> > Is there any reason for this particular limit? Maybe it's
> > >> > possible to recompile the module with a different cutoff?
> > >> >
> > >> > thanks a lot for your help,
> > >> > kind regards,
> > >> >
> > >> >
> > >> > Arne
> > >> >
> > >> > ______________________________________________
> > >> > R-help@stat.math.ethz.ch mailing list
> > >> > https://stat.ethz.ch/mailman/listinfo/r-help
> > >> > PLEASE do read the posting guide!
> > >> > http://www.R-project.org/posting-guide.html
> > >> >
> > >> >
> > >> >
> > >>
> > >> ______________________________________________
> > >> R-help@stat.math.ethz.ch mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> > >>
> > >
> > >
> >
> >
>
>
>
> --
> Weiwei Shi, Ph.D
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Jul 08 06:17:18 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:21 EST