From: Richard Graham <rickhg12hs_at_gmail.com>

Date: Tue 17 Oct 2006 - 06:26:50 GMT

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Oct 17 16:36:46 2006

Date: Tue 17 Oct 2006 - 06:26:50 GMT

On 10/16/06, Björn Egert <begert@ipb-halle.de> wrote:

> On 10/8/06, Egert, Bjoern <begert@ipb-halle.de> wrote:

*> >> Hello,
**> >>
**> >> Is there a way in R to construct an (error correcting) binary code
**> >> e.g. for an source alphabet containing integers from 1 to say 255
**> >> with the property that each pair of distinct codewords of length m
**> >> is at Hamming distance exactly m/2 ?
**> >>
**> >> I was suggested to use so called simplex codes, which should be
**> >> fairly standard, but I haven't found a direct way via R packages
**> >> to do so, that's why I ask whether there might be in indirect way
**> >> to solve this problem.
**> >>
**> >> Example:
**> >> v1 =c(1,2,3,4)
**> >> v2 =c(1,2,5,6)
**> >> similarity(v1,v2)=0.5, (because 2 out of 4 elements are equal).
**> >> Obviously, a binary representation of would yield a different
**> >> similarity of:
**> >> binary(v1) =001 010 011 100
**> >> binary(v1) =001 010 101 110
**> >> similarity(binary(v1),binary(v2))= 9/12
**> >>
**> >> Remark: The focus here is not on error correction, but rather the
**> >> binary encoding retaining similarity of the elements of vectors.
**> >>
**> >> Many thanks,
**> >> Bjoern
**> >
**> > Bjoern,
**> >
**> > NB: I'm an R newbie and I only know a bit about error correcting codes.
**> >
**> > I haven't seen any responses to your questions and I don't know if you
**> > still
**> > have a need, but it is certainly possible to construct forward error
**> > correction
**> > codes with all the great math capability in R.
**> >
**> > It seems you want to generate code words that still have the original
**> > bits
**> > present. These are systematic codes and there are lots of them available
**> > to use. Many codes are specified by the code word length (n), number
**> > of original data
**> > bits in each code word (k), and the minimum Hamming distance of the
**> > code words (d)
**> > as a [n,k,d] code. Simplex Codes have these parameters: [2^k - 1, k,
**> > 2^(k - 1)]. These
**> > codes could be generated as a simple matrix multiply in R, but are you
**> > sure that's what
**> > you want? The code words will be quite long.
**> >
**> > Regards,
**> > Richard Graham
**>
**>
**> Hello,
**>
**> thank you.
**>
**> yes, basically, that's what I want.
**> Just a binary encoding of an arbitrary integer value (or vector of
**> integers)
**> with the property that each pair of distinct integer values have an
**> equal Hamming-
**> distance (m/2), so as to be able to a similarity search
**>
**> I got the idea from: Gionis: "Efficient and Tunable Similar Set
**> Retrieval" (Chap 3.2)
**>
**> regards
**> Bjoern
**>
**>
*

Bjoern,

I read only the section of the paper you mention and I'll trust that the stated properties of Simplex Codes are true. I haven't researched or verified it.

[from http://magma.maths.usyd.edu.au/]

"Magma is a large, well-supported software package designed to solve
computationally hard problems in algebra, number theory, geometry and
combinatorics. It provides a mathematically rigorous environment for
computing with algebraic, number-theoretic, combinatoric and geometric
objects."

I don't understand a fraction of its capability but I still find it to be very useful. In fact, they have an online calculator that will give you the generator matrix you want. The online Magma calculator is at:

http://magma.maths.usyd.edu.au/calc/

To calculate the generator matrix I think you are asking for, go to the above URL and cut/paste the following command:

ExtendCode(SimplexCode(8));

Click "Evaluate" and the output window will contain a "[256, 8, 128] Linear Code over GF(2)". You'll need to "massage" this a bit to use it as a matrix for R. I'd use Ruby to do this, but anything will do. If you want to encode more/less than 8 bits, you can modify the above argument to SimplexCode.

I used "ExtendCode" so that the codeword length == Dmin * 2

The Gionis claim I'll research or verify sometime is that _every_ pair of Simplex Code words of length m have Hamming distance == m/2. If you have a reference to a proof, I'd like to read it (like I said, I only know a bit about ECC).

Good Luck with your work!

Richard Graham

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Oct 17 16:36:46 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Tue 17 Oct 2006 - 15:30:11 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*