[R-pkgs] Rule-based regression models: Cubist

From: kuhnA03 <max.kuhn_at_pfizer.com>
Date: Wed, 27 Apr 2011 15:37:39 -0400


Cubist is a rule-based machine learning model for regression. Parts of the Cubist model are described in:

   Quinlan. Learning with continuous classes. Proceedings    of the 5th Australian Joint Conference On Artificial    Intelligence (1992) pp. 343-348

   Quinlan. Combining instance-based and model-based    learning. Proceedings of the Tenth International Conference    on Machine Learning (1993) pp. 236-243

RuleQuest, the company that created the program, now have a version available under the GPL at:

   http://rulequest.com/cubist-info.html

We've taken the Cubist GPL code and created an R interface. The package locations are:

   http://cran.r-project.org/web/packages/mvpart/index.html

and

   https://r-forge.r-project.org/projects/rulebasedmodels/

The primary functions are cubist() for creating the ruled and the terminal models and predict.cubist() to predict new outcomes. The model allows for instance-based corrections of the model predictions. We've separated the instance-based correction from the model build so that the choice of instances is only needed when samples are predicted. An interface for tuning the Cubist model will be available in the caret package shortly.

We are also working on a similar port of C5.0 (also GPL'ed). The C code is very similar, so much of the Cubist changes can be extended. That said, we'd appreciate help if anyone wants to contribute.

Here is an example cubist session:

library(mlbench)
data(BostonHousing)

## 1 committee and no instance-based correction, so just an M5 fit:
mod1 <- cubist(x = BostonHousing[, -14], y = BostonHousing$medv) summary(mod1)

## example output:

## Cubist [Release 2.07 GPL Edition] Sun Apr 10 17:36:56 2011
## ---------------------------------
##
## Target attribute `outcome'
##
## Read 506 cases (14 attributes) from undefined.data
##
## Model:
##
## Rule 1: [101 cases, mean 13.84, range 5 to 27.5, est err 1.98]
##
## if
## nox > 0.668
## then
## outcome = -1.11 + 2.93 dis + 21.4 nox - 0.33 lstat + 0.008 b
## - 0.13 ptratio - 0.02 crim - 0.003 age + 0.1 rm
##
## Rule 2: [203 cases, mean 19.42, range 7 to 31, est err 2.10]
##
## if
## nox <= 0.668
## lstat > 9.59
## then
## outcome = 23.57 + 3.1 rm - 0.81 dis - 0.71 ptratio - 0.048 age
## - 0.15 lstat + 0.01 b - 0.0041 tax - 5.2 nox + 0.05 crim
## + 0.02 rad
##
## Rule 3: [43 cases, mean 24.00, range 11.9 to 50, est err 2.56]
##
## if
## rm <= 6.226
## lstat <= 9.59
## then
## outcome = 1.18 + 3.83 crim + 4.3 rm - 0.06 age - 0.11 lstat - 0.003
tax
## - 0.09 dis - 0.08 ptratio
##
## Rule 4: [163 cases, mean 31.46, range 16.5 to 50, est err 2.78]
##
## if
## rm > 6.226
## lstat <= 9.59
## then
## outcome = -4.71 + 2.22 crim + 9.2 rm - 0.83 lstat - 0.0182 tax
## - 0.72 ptratio - 0.71 dis - 0.04 age + 0.03 rad - 1.7 nox
## + 0.008 zn
##
##
## Evaluation on training data (506 cases):
##
## Average |error| 2.07
## Relative |error| 0.31
## Correlation coefficient 0.94
##
##
## Attribute usage:
## Conds Model
##
## 80% 100% lstat
## 60% 92% nox
## 40% 100% rm
## 100% crim
## 100% age
## 100% dis
## 100% ptratio
## 80% tax
## 72% rad
## 60% b
## 32% zn
##
##
## Time: 0.0 secs

Thanks,

Max, Steve and Chris



R-packages mailing list
R-packages_at_r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages Received on Thu 28 Apr 2011 - 05:44:30 EST

This archive was generated by hypermail 2.2.0 : Thu 28 Apr 2011 - 05:50:01 EST