Re: [R] caretNWS and training data set sizes

From: Max Kuhn <mxkuhn_at_gmail.com>
Date: Mon, 10 Mar 2008 14:03:16 -0400

Peter,

You are certainly up to date. Can you try replicating this using only two nodes (since you only have two processors)? I'm not sure that specifying 5 really helps. Using 2 nodes on my mac usually gets me about a 30-40% decrease in time.

Also, are the processes just hanging or is there an error? These models may take a while. Perhaps testing with pls, lm or some other fast model might help troubleshoot.

If you are not passing a sleigh object into the trainNWS call, you can do this by using

trainNWSControl(

         start = makeSleighStarter(workerCount = 2))

The only other thing I can suggest is to send me the data (or an anonymized knock-off) so that I can test. You certainly should be able to do this, but you may be limited by your machine.

Max

On Mon, Mar 10, 2008 at 1:18 PM, Tait, Peter <ptait_at_skura.com> wrote:
> Hi Max,
> Thank you for the fast response.
>
> Here are the versions of the R packages I am using:
>
> caret 3.13
> caretNWS 0.16
> nws 1.62
>
> Here are the python versions
>
> Active Python 2.5.1.1
> nws server 1.5.2 for py2.5
> twisted 2.5.9 py2.5
>
> The computer I am using has 1 Xeon dual core cpu at 1.86 GHz with 4 GB of RAM. R is currently set up to use 2 GB of it (it starts with "C:\Program Files\R\R-2.6.2\bin\Rgui.exe" --max-mem-size=2047M). The OS is Windows Server 2003 R2 with SP2.
>
> I am running one R job/process (Rgui.exe) and almost nothing else on the computer while R is running (no databases, web servers, office apps etc..)
>
> I really appreciate your help.
> Cheers
> Peter
>
>
>
>
> >-----Original Message-----
> >From: Max Kuhn [mailto:mxkuhn_at_gmail.com]
> >Sent: Monday, March 10, 2008 12:41 PM
> >To: Tait, Peter
> >Cc: r-help_at_R-project.org
> >Subject: Re: [R] caretNWS and training data set sizes
> >
> >What version of caret and caretNWS are you using? Also, what version
> >of the nws server and twisted are you using? What kind of machine (#
> >processors, how much physical memory etc)?
> >
> >I haven't seen any real limitations with one exception: if you are
> >running P jobs on the same machine, you are replicating the memory
> >needs P times.
> >
> >I've been running jobs with 4K to 90K samples and 1200 predictors
> >without issues, so I'll need a lot more information to help you.
> >
> >Max
> >
> >
> >On Mon, Mar 10, 2008 at 12:04 PM, Tait, Peter <ptait_at_skura.com> wrote:
> >> Hi,
> >>
> >> I am using the caretNWS package to train some supervised regression
> >models (gbm, lasso, random forest and mars). The problem I have encountered
> >started when my training data set increased in the number of predictors and
> >the number of observations.
> >>
> >> The training data set has 347 numeric columns. The problem I have is
> >when there are more then 2500 observations the 5 sleigh objects start but
> >do not use any CPU resources and do not process any data.
> >>
> >> N=100 cpu(%) memory(K)
> >> Rgui.exe 0 91737
> >> 5x sleighs (RTerm.exe) 15-25 ~27000
> >>
> >> N=2500
> >> Rgui.exe 0 160000
> >> 5x sleighs (RTerm.exe) 15-25 ~74000
> >>
> >> N=5000
> >> Rgui.exe 50 193000
> >> 5x sleighs (RTerm.exe) 0 ~19000
> >>
> >>
> >> A 10% sample of my overall data is ~22000 observations.
> >>
> >> Can someone give me an idea of the limitations of the nws and caretNWS
> >packages in terms of the number of columns and rows of the training
> >matrices and if there are other tuning/training functions that work faster
> >on large datasets?
> >>
> >> Thanks for your help.
> >> Peter
> >>
> >>
> >> > version
> >> _
> >> platform i386-pc-mingw32
> >> arch i386
> >> os mingw32
> >> system i386, mingw32
> >> status
> >> major 2
> >> minor 6.2
> >> year 2008
> >> month 02
> >> day 08
> >> svn rev 44383
> >> language R
> >> version.string R version 2.6.2 (2008-02-08)
> >>
> >> > memory.limit()
> >> [1] 2047
> >>
> >> ______________________________________________
> >> R-help_at_r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-
> >guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >
> >
> >--
> >
> >Max
>

-- 

Max

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 10 Mar 2008 - 18:06:16 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 10 Mar 2008 - 18:30:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive