Re: [R] Error Correcting Codes, Simplex

From: Richard Graham <rickhg12hs_at_gmail.com>
Date: Tue 17 Oct 2006 - 06:26:50 GMT

On 10/16/06, Björn Egert <begert@ipb-halle.de> wrote:
> On 10/8/06, Egert, Bjoern <begert@ipb-halle.de> wrote:
> >> Hello,
> >>
> >> Is there a way in R to construct an (error correcting) binary code
> >> e.g. for an source alphabet containing integers from 1 to say 255
> >> with the property that each pair of distinct codewords of length m
> >> is at Hamming distance exactly m/2 ?
> >>
> >> I was suggested to use so called simplex codes, which should be
> >> fairly standard, but I haven't found a direct way via R packages
> >> to do so, that's why I ask whether there might be in indirect way
> >> to solve this problem.
> >>
> >> Example:

> >> v1 =c(1,2,3,4)
> >> v2 =c(1,2,5,6)
> >> similarity(v1,v2)=0.5, (because 2 out of 4 elements are equal).
> >> Obviously, a binary representation of would yield a different
> >> similarity of:
> >> binary(v1) =001 010 011 100
> >> binary(v1) =001 010 101 110
> >> similarity(binary(v1),binary(v2))= 9/12
> >>
> >> Remark: The focus here is not on error correction, but rather the
> >> binary encoding retaining similarity of the elements of vectors.
> >>
> >> Many thanks,
> >> Bjoern
> >
> > Bjoern,
> >
> > NB: I'm an R newbie and I only know a bit about error correcting codes.
> >
> > I haven't seen any responses to your questions and I don't know if you
> > still
> > have a need, but it is certainly possible to construct forward error
> > correction
> > codes with all the great math capability in R.
> >
> > It seems you want to generate code words that still have the original
> > bits
> > present. These are systematic codes and there are lots of them available
> > to use. Many codes are specified by the code word length (n), number
> > of original data
> > bits in each code word (k), and the minimum Hamming distance of the
> > code words (d)
> > as a [n,k,d] code. Simplex Codes have these parameters: [2^k - 1, k,
> > 2^(k - 1)]. These
> > codes could be generated as a simple matrix multiply in R, but are you
> > sure that's what
> > you want? The code words will be quite long.
> >
> > Regards,
> > Richard Graham
>
>
> Hello,
>
> thank you.
>
> yes, basically, that's what I want.
> Just a binary encoding of an arbitrary integer value (or vector of
> integers)
> with the property that each pair of distinct integer values have an
> equal Hamming-
> distance (m/2), so as to be able to a similarity search
>
> I got the idea from: Gionis: "Efficient and Tunable Similar Set
> Retrieval" (Chap 3.2)
>
> regards
> Bjoern
>
>

Bjoern,

I read only the section of the paper you mention and I'll trust that the stated properties of Simplex Codes are true. I haven't researched or verified it.

[from http://magma.maths.usyd.edu.au/]
"Magma is a large, well-supported software package designed to solve computationally hard problems in algebra, number theory, geometry and combinatorics. It provides a mathematically rigorous environment for computing with algebraic, number-theoretic, combinatoric and geometric objects."

I don't understand a fraction of its capability but I still find it to be very useful. In fact, they have an online calculator that will give you the generator matrix you want. The online Magma calculator is at:

http://magma.maths.usyd.edu.au/calc/

To calculate the generator matrix I think you are asking for, go to the above URL and cut/paste the following command:

ExtendCode(SimplexCode(8));

Click "Evaluate" and the output window will contain a "[256, 8, 128] Linear Code over GF(2)". You'll need to "massage" this a bit to use it as a matrix for R. I'd use Ruby to do this, but anything will do. If you want to encode more/less than 8 bits, you can modify the above argument to SimplexCode.

I used "ExtendCode" so that the codeword length == Dmin * 2

The Gionis claim I'll research or verify sometime is that _every_ pair of Simplex Code words of length m have Hamming distance == m/2. If you have a reference to a proof, I'd like to read it (like I said, I only know a bit about ECC).

Good Luck with your work!
Richard Graham



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Oct 17 16:36:46 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 17 Oct 2006 - 15:30:11 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.