Re: [R] Discriminant function analysis

From: Gavin Simpson <gavin.simpson_at_ucl.ac.uk>
Date: Thu, 07 Feb 2008 14:36:58 +0000

hits=-2.6 tests║YES_00
X-USF-Spam-Flag: NO

On Thu, 2008-02-07 at 13:21 +0000, Tyler Smith wrote:
> On 2008-02-07, Birgit Lemcke <birgit.lemcke@systbot.uzh.ch> wrote:
> >
> > Am 06.02.2008 um 21:00 schrieb Tyler Smith:
> >>
> >>> My dataset contains variables of the classes factor and numeric. Is
> >>> there another function that is able to handle this?
> >>
> >> The numeric variables are fine. The factor variables may have to be
> >> recoded into dummy binary variables, I'm not sure if lda() will deal
> >> with them properly otherwise.
> >
> > But aren┬┤t binary variables also factors? Or is there another
> > variable class than factor or numeric?
> > Do I have have to set the classe of the binaries as numeric?
> >
>
> There is no binary class in R, so you would have to use a numeric
> field. For example:

I think Birgit (from previous emails to the list) has been treating binary data as factors when producing Gower's dissimilarity.

In R binary data can be represented in various ways:

bin <- factor(sample(0:1, 20, replace = TRUE))

bin2 <- as.numeric(as.character(bin))
bin3 <- sample(0:1, 20, replace = TRUE)
bin4 <- sample(c(0, 1), 20, replace = TRUE)

dat <- data.frame(bin, bin2, bin3, bin4) sapply(dat, class)

The /numeric/ representation can be "numeric" or "integer".

But I'm not sure this matters much. If you use the formula interface to lda(), factors get expanded to the dummy variables Tyler is talking about. But of course, a factor with two levels 0/1 doesn't need much manipulation as you only need a single dummy variable to represent its two states:

model.matrix(gl(4,5) ~ bin + bin2 + bin3 + bin4, data = dat)

See how bin is converted to bin1 only. So you can either do the conversion before hand (as I did to get bin2) or just supply bin directly in the formula to lda and model.matrix will take care of it for you.

You might want to standardise your exp variables to zero mean and unit variance prior to doing the lda so that all variables carry the same weight, if you have mixtures of numeric (continuous) variables and binary ones.

G

> | sample | factor_1 |
> |--------+----------|
> | A | red |
> | B | green |
> | C | blue |
>
> becomes:
>
> | sample | dummy_1 | dummy_2 |
> |--------+---------+---------|
> | A | 1 | 0 |
> | B | 0 | 1 |
> | C | 0 | 0 |
>
> R can deal with dummy_1 and dummy_2 as numeric vectors. The details
> should be explained in a good reference on multivariate statistics
> (I'm looking at Legendre and Legendre (1998) section 1.5.7 and 11.5).
>
> HTH,
>
> Tyler
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%


______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Received on Thu 07 Feb 2008 - 14:39:20 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 07 Feb 2008 - 16:30:12 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive