From: Douglas Bates <bates_at_stat.wisc.edu>

Date: Mon, 12 May 2008 11:55:19 -0500

>> On Sun, May 11, 2008 at 9:49 AM, amarkos <amar...@gmail.com> wrote:

*>> > On May 11, 4:47 pm, "Douglas Bates" <ba..._at_stat.wisc.edu> wrote:
*

*>>
*

*>> >> Do you mean that you want to collapse similar rows into a single row
*

*>> >> and perhaps a count of the number of times that this row occurs?
*

*>>
*

*>> > Let me rephrase the problem by providing an example.
*

*>>
*

*>> > Input:
*

*>>
*

*>> > A =
*

*>> > [,1] [,2]
*

*>> > [1,] 1 1
*

*>> > [2,] 1 3
*

*>> > [3,] 2 1
*

*>> > [4,] 1 2
*

*>> > [5,] 2 1
*

*>> > [6,] 1 2
*

*>> > [7,] 1 1
*

*>> > [8,] 1 2
*

*>> > [9,] 1 3
*

*>> > [10,] 2 1
*

*>>
*

*>> An important question here is do you start with two or more variables
*

*>> like the columns of your matrix A? If so, there is a more direct
*

*>> method of getting the answers that you want. The natural way to store
*

*>> such variables in R is as factors. I prefer to use letters instead of
*

*>> numbers to represent the levels of a factor (that way I don't confuse
*

*>> a factor with a numeric variable when I look at rows) so I would
*

*>> create a data frame with two factors instead of a matrix.
*

*>>
*

*>> > V1 <- factor(c(1,1,2,1,2,1,1,1,1,2), labels = LETTERS[1:2])
*

*>> > V2 <- factor(c(1,3,1,2,1,2,1,2,3,1), labels = letters[1:3])
*

*>> > df <- data.frame(f1 = V1, f2 = V2)
*

*>> > df
*

*>>
*

*>> f1 f2
*

*>> 1 A a
*

*>> 2 A c
*

*>> 3 B a
*

*>> 4 A b
*

*>> 5 B a
*

*>> 6 A b
*

*>> 7 A a
*

*>> 8 A b
*

*>> 9 A c
*

*>> 10 B a
*

*>>
*

*>> You could produce the indicator matrix and check for unique rows, etc.
*

*>> - I will show that below - but all you need is the interaction of the
*

*>> two factors
*

*>>
*

*>> > df$f12 <- with(df, f1:f2)[drop = TRUE]
*

*>> > df
*

*>>
*

*>> f1 f2 f12
*

*>> 1 A a A:a
*

*>> 2 A c A:c
*

*>> 3 B a B:a
*

*>> 4 A b A:b
*

*>> 5 B a B:a
*

*>> 6 A b A:b
*

*>> 7 A a A:a
*

*>> 8 A b A:b
*

*>> 9 A c A:c
*

*>> 10 B a B:a> str(df)
*

*>>
*

*>> 'data.frame': 10 obs. of 3 variables:
*

*>> $ f1 : Factor w/ 2 levels "A","B": 1 1 2 1 2 1 1 1 1 2
*

*>> $ f2 : Factor w/ 3 levels "a","b","c": 1 3 1 2 1 2 1 2 3 1
*

*>> $ f12: Factor w/ 4 levels "A:a","A:b","A:c",..: 1 3 4 2 4 2 1 2 3 4
*

*>>
*

*>> > table(df$f12)
*

*>>
*

*>> A:a A:b A:c B:a
*

*>> 2 3 2 3> as.numeric(df$f12)
*

*>>
*

*>> [1] 1 3 4 2 4 2 1 2 3 4
*

*>>
*

*>> Notice that this shows you that there are four distinct combinations
*

*>> that occur 2, 3, 2 and 3 times respectively; the first combination
*

*>> occurs in rows 1 and 7, it consists of the first level of f1 and the
*

*>> first level of f2, etc.
*

*>>
*

*>> If you really do want the indicator matrix you could generate it as
*

*>>
*

*>> > (ind <- cbind(model.matrix(~ 0 + f1, df), model.matrix(~ 0 + f2, df)))
*

*>>
*

*>> f1A f1B f2a f2b f2c
*

*>> 1 1 0 1 0 0
*

*>> 2 1 0 0 0 1
*

*>> 3 0 1 1 0 0
*

*>> 4 1 0 0 1 0
*

*>> 5 0 1 1 0 0
*

*>> 6 1 0 0 1 0
*

*>> 7 1 0 1 0 0
*

*>> 8 1 0 0 1 0
*

*>> 9 1 0 0 0 1
*

*>> 10 0 1 1 0 0> unique(ind)
*

*>>
*

*>> f1A f1B f2a f2b f2c
*

*>> 1 1 0 1 0 0
*

*>> 2 1 0 0 0 1
*

*>> 3 0 1 1 0 0
*

*>> 4 1 0 0 1 0
*

*>>
*

*>> but working with the factors is generally much simpler than working
*

*>> with the indicators.
*

*>>
*

*>>
*

*>>
*

*>> > # Indicator matrix
*

*>> > A <- data.frame(lapply(data.frame(obj), as.factor))
*

*>>
*

*>> > nocases <- dim(obj)[1]
*

*>> > novars <- dim(obj)[2]
*

*>>
*

*>> > # variable levels
*

*>> > levels.n <- sapply(obj, nlevels)
*

*>> > n <- cumsum(levels.n)
*

*>>
*

*>> > # Indicator matrix calculations
*

*>> > Z <- matrix(0, nrow = nocases, ncol = n[length(n)])
*

*>> > newdat <- lapply(obj, as.numeric)
*

*>> > offset <- (c(0, n[-length(n)]))
*

*>> > for (i in 1:novars)
*

*>> > Z[1:nocases + (nocases * (offset[i] + newdat[[i]] - 1))] <- 1
*

*>>
*

*>> > #######
*

*>>
*

*>> > Output:
*

*>>
*

*>> > Z =
*

*>>
*

*>> > [,1] [,2] [,3] [,4] [,5]
*

*>> > [1,] 1 0 1 0 0
*

*>> > [2,] 1 0 0 0 1
*

*>> > [3,] 0 1 1 0 0
*

*>> > [4,] 1 0 0 1 0
*

*>> > [5,] 0 1 1 0 0
*

*>> > [6,] 1 0 0 1 0
*

*>> > [7,] 1 0 1 0 0
*

*>> > [8,] 1 0 0 1 0
*

*>> > [9,] 1 0 0 0 1
*

*>> > [10,] 0 1 1 0 0
*

*>>
*

*>> > Z is an indicator matrix in the Multiple Correspondence Analysis
*

*>> > framework.
*

*>> > My problem is to collapse identical rows (e.g. 2 and 9) into a single
*

*>> > row and
*

*>> > store the row ids.
*

*>>
*

*>> > ______________________________________________
*

*>> > R-h..._at_r-project.org mailing list
*

*>> >https://stat.ethz.ch/mailman/listinfo/r-help
*

*>> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
*

*>> > and provide commented, minimal, self-contained, reproducible code.
*

*>>
*

*>> ______________________________________________
*

*>> R-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
*

*>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
*

*>> and provide commented, minimal, self-contained, reproducible code.
*

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 12 May 2008 - 17:32:01 GMT

Date: Mon, 12 May 2008 11:55:19 -0500

On Mon, May 12, 2008 at 11:27 AM, amarkos <amarkos_at_gmail.com> wrote: > Thanks, it works!

> Could you please provide the direct method you mentioned for the > multivariate case?

I'm not sure what you mean. I looked at what I wrote and I don't see anything that would fit that description.

May I suggest that you continue to cc: the R-help list on the discussion. I can't always respond rapidly to requests and there are many who read the list that can.

> On May 12, 4:30 pm, "Douglas Bates" <ba..._at_stat.wisc.edu> wrote:

>> On Sun, May 11, 2008 at 9:49 AM, amarkos <amar...@gmail.com> wrote:

> > Angelos Markos > Dr. of Applied Informatics, > University of Macedonia, Greece > ______________________________________________R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 12 May 2008 - 17:32:01 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Mon 12 May 2008 - 18:30:36 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*