From: Adrian DUSA <dusa.adrian_at_gmail.com>

Date: Fri 16 Dec 2005 - 04:04:37 EST

Date: Fri 16 Dec 2005 - 04:04:37 EST

As a toy example, having an input matrix called "input", my algorithm looks like this:

## code start

ncolumns <- 6

input <- bincombinations(ncolumns) # from package e1071
# subset, let's say 97% of rows

input <- input[sample(2^ncolumns, round(2^ncolumns*0.97, 0), ]
minimized <- 1

while (sum(minimized) > 0) {

minimized <- logical(nrow(input))

to.be.compared <- combn2(1:nrow(input)) # from package combinat

# the following line takes _a lot_ of time, for millions of comparisons logical.result <- apply(to.be.compared, 1, function(idx) input[idx[1], ] == input[idx[2], ])

compare.minimized <- which(colSums(!logical.result) == 1)

logical.result <- logical.result[, compare.minimized]

result <- sapply(compare.minimized, function(idx) input[to.be.compared[idx, 1], ])

result[!logical.result] <- "x"

minimized[unique(as.vector(to.be.compared[compare.minimized, ]))] <- TRUE

if (sum(minimized) > 0) {

input <- rbind(input[!minimized, ], unique(t(result)))
}

}

## code end

Any suggestion is welcomed, thank you very much in advance. Adrian

-- Adrian DUSA Romanian Social Data Archive 1, Schitu Magureanu Bd 050025 Bucharest sector 5 Romania Tel./Fax: +40 21 3126618 \ +40 21 3120210 / int.101 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlReceived on Fri Dec 16 04:13:08 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:41:39 EST
*