From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>

Date: Fri 09 Dec 2005 - 10:02:30 EST

Date: Fri 09 Dec 2005 - 10:02:30 EST

On Fri, 9 Dec 2005, Richard A. O'Keefe wrote:

> I am trying to automatically construct a distance function from

*> a training set in order to use it to cluster another data set.
**> The variables are nominal. One variable is a "class" variable
**> having two values; it is kept separate from the others.
**>
**> I have a method which constructs a distance matrix for the levels
**> of a nominal variable in the context of the other variables.
**>
**> I want to construct a linear combination of these which gives me
**> a distance between whole cases that is well associated with the
**> class variable, in that
**> "combined distance between two cases large =>
**> they most likely belong to different classes."
**>
**> So from my training set I construct a set of
**> (d1(x1,y1), ..., dn(xn,yn), x_class != y_class)
**> rows bound together as a data frame (actually I construct it by
**> columns), and then the obvious thing to try was
**>
**> glm(different.class ~ ., family = binomial(), data = distance.frame)
**>
**> The thing is that this gives me both positve and negative coefficients,
**> whereas the linear combination is only guaranteed to be a metric if the
**> coefficients are all non-negative.
**>
**> There are four fairly obvious ways to deal with that:
**> (1) just force the negative coefficients to 0 and hope.
**> This turns out to work rather well, but still...
**> (2) keep all the coefficients but take max(0, linear combination of distances).
**> This turns out to work rather well, but still...
**> (3) Drop the variables with negative coefficients from the model,
**> refit, and iterate until no negative coefficients remain.
**> This can hardly be said to work; sometimes nearly all the variables
**> are dropped.
**> (4) Use a version of glm() that will let me constrain the coefficients
**> to be non-negative.
**>
**> I *have* searched the R-help archives, and I see that the question about
**> logistic regression with constrained coefficients has come up before, but
**> it didn't really get a satisfactory answer. I've also searched the
**> documentation of more contributed packages than I could possibly understand.
**>
**> There is obviously some way to do this using R's general non-linear
**> optimisation functions. However, I don't know how to formulate logistic
**> regression that way.
*

There is a worked example in MASS (the book) p.445, including adding constraints.

-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlReceived on Fri Dec 09 10:13:31 2005

*
This archive was generated by hypermail 2.1.8
: Fri 09 Dec 2005 - 14:29:51 EST
*