Re: [R] caretNWS and training data set sizes

From: Tait, Peter <ptait_at_skura.com>
Date: Mon, 10 Mar 2008 13:18:51 -0400

Hi Max,
Thank you for the fast response.

Here are the versions of the R packages I am using:

caret 3.13
caretNWS 0.16
nws 1.62

Here are the python versions

Active Python 2.5.1.1
nws server 1.5.2 for py2.5
twisted 2.5.9 py2.5

The computer I am using has 1 Xeon dual core cpu at 1.86 GHz with 4 GB of RAM. R is currently set up to use 2 GB of it (it starts with "C:\Program Files\R\R-2.6.2\bin\Rgui.exe" --max-mem-size=2047M). The OS is Windows Server 2003 R2 with SP2.

I am running one R job/process (Rgui.exe) and almost nothing else on the computer while R is running (no databases, web servers, office apps etc..)

I really appreciate your help.
Cheers
Peter

>-----Original Message-----
>From: Max Kuhn [mailto:mxkuhn_at_gmail.com]
>Sent: Monday, March 10, 2008 12:41 PM
>To: Tait, Peter
>Cc: r-help@R-project.org
>Subject: Re: [R] caretNWS and training data set sizes
>
>What version of caret and caretNWS are you using? Also, what version
>of the nws server and twisted are you using? What kind of machine (#
>processors, how much physical memory etc)?
>
>I haven't seen any real limitations with one exception: if you are
>running P jobs on the same machine, you are replicating the memory
>needs P times.
>
>I've been running jobs with 4K to 90K samples and 1200 predictors
>without issues, so I'll need a lot more information to help you.
>
>Max
>
>
>On Mon, Mar 10, 2008 at 12:04 PM, Tait, Peter <ptait_at_skura.com> wrote:
>> Hi,
>>
>> I am using the caretNWS package to train some supervised regression
>models (gbm, lasso, random forest and mars). The problem I have encountered
>started when my training data set increased in the number of predictors and
>the number of observations.
>>
>> The training data set has 347 numeric columns. The problem I have is
>when there are more then 2500 observations the 5 sleigh objects start but
>do not use any CPU resources and do not process any data.
>>
>> N=100 cpu(%) memory(K)
>> Rgui.exe 0 91737
>> 5x sleighs (RTerm.exe) 15-25 ~27000
>>
>> N=2500
>> Rgui.exe 0 160000
>> 5x sleighs (RTerm.exe) 15-25 ~74000
>>
>> N=5000
>> Rgui.exe 50 193000
>> 5x sleighs (RTerm.exe) 0 ~19000
>>
>>
>> A 10% sample of my overall data is ~22000 observations.
>>
>> Can someone give me an idea of the limitations of the nws and caretNWS
>packages in terms of the number of columns and rows of the training
>matrices and if there are other tuning/training functions that work faster
>on large datasets?
>>
>> Thanks for your help.
>> Peter
>>
>>
>> > version
>> _
>> platform i386-pc-mingw32
>> arch i386
>> os mingw32
>> system i386, mingw32
>> status
>> major 2
>> minor 6.2
>> year 2008
>> month 02
>> day 08
>> svn rev 44383
>> language R
>> version.string R version 2.6.2 (2008-02-08)
>>
>> > memory.limit()
>> [1] 2047
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
>--
>
>Max



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 10 Mar 2008 - 17:24:11 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 10 Mar 2008 - 19:31:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive