Efficiently factoring polynomials over $\Bbb F_2$

Question

I am attempting to write some software which is intended to generically answer the question of which Cyclic Redundancy Code (CRC) generating polynomial is used for a given set of sample messages using the same unknown CRC.

For example here are two messages in hexadecimal which use the same CRC:

ee 00 00 00 00 01 20 13 10  (message 1)
ee 00 00 00 00 03 20 a3 23  (message 2)

Since CRC calculations are essentially polynomial division over GF(2), We can consider each CRC calculation to be $M(x) x^n = Q(x) G(x) + R(x)$ where $M(x)$ is the original message appended with $n$ zeroes, $G(x)$ is the generating polynomial and $R(x)$ is the remainder which is identical to the CRC value attached to the end of the message. In this particular case, the original message is

ee 00 00 00 00 01 20

and the appended $R(x)$ is 13 10

Because these two messages use the same generating polynomial, we know that $M_1(x) x^n = Q_1(x) G(x) + R_1(x)$ and $M_2(x) x^n = Q_2(x) G(x) + R_2(x)$. Subtracting one from the other we get: $$\begin{eqnarray*} M_1(x) x^n - M_2(x) x^n &=& Q_1(x) G(x) + R_1(x) - Q_2(x) G(x) - R_2(x)\\ (M_1(x) - M_2(x))x^n - R_1(x) + R_2(x) &=& (Q_1(x)-Q_2(x))G(x)\\ \end{eqnarray*}$$ Since addition and subtraction of polynomials over GF(2) are both identical to the exclusive-or operation, this result means that the exclusive-or of the two messages (including the CRCs) is identical to $(Q_1(x)-Q_2(x))G(x)$, so finding the generating polynomial $G(x)$ could be done by factoring this result. In the specific case of the two messages above, we get

02 00 b0 33

Because of the way CRCs are most often calculated, the bits are "swapped" so that this actually corresponds to the polynomial: $x^{30}+x^{11}+x^{10}+x^8+x^7+x^6+x^3+x^2$. This is trivially factored into $x^2(x^{28}+x^9+x^8+x^6+x^5+x^4+x+1)$, but that's where I get stuck. I know that in this particular case, $G(x) = x^{16}+x^{12}+x^5+1$ (which some may recognize as the standard 16-bit CCITT polynomial).

How do I write efficient code to factor that polynomial over GF(2) without resorting to brute force?

Looking for literature that would help answer that question, I found several papers that looked promising:

However, I am an engineer rather than a mathematician, and I find I am unable to understand what the papers actually mean and therefore unable to use whatever clever algorithms they purport to contain. Short of going back for a degree in field theory, is there some means by which I might learn how to use modern polynomial factoring techniques in application to this problem?

Factoring polynomials over a finite field has a lot in common with prime factorization of integers, at least in broad outline. How large degree do you want to factor? — hardmath, Sep 05 '14 at 20:44
I'd like to be able to factor arbitrarily large polynomials, but the largest is likely to be of degree 300 or less and the degree of the generating polynomial (one of the factors) will always be 65 or smaller. — Edward, Sep 05 '14 at 20:48
I attempted to read both Berlekamp and Cantor and Zassenhaus and some others. The older articles seem more comprehensible to me, but I was hoping to be able to use the latest, fastest algorithms. — Edward, Sep 05 '14 at 21:03
A toy example of Berlekamp's method is here. The setting there is quite different from yours. One key idea is that calculating $$\gcd(x^{2^n}+x,p(x))$$ gives the product of those factors that appear as factors of $p(x)$ and have degrees that are factors of $n$. Multiple factors will show up only once. But you can first do this for several $n$, and find some factors. The fun begins, when you need to find the equal degree irreducible factors. — Jyrki Lahtonen, Sep 05 '14 at 21:03
@JyrkiLahtonen: the links look like they may be useful to me. I'll study them over the weekend. Thanks! — Edward, Sep 05 '14 at 21:07
On second thought that toy example may not be very useful to you, because in characteristic two you cannot use squares. Anyway, I would not be surprised if Berlekamp will be fast enough for your range of degrees. If it's good enough for Wolfram...? — Jyrki Lahtonen, Sep 05 '14 at 21:10
If you are interested in already written software, Sage allows one to factor polynomials over finite fields, esp. $\mathbb{F}_2$ where it uses Victor Shoup's NTL package for faster polynomial multiplication and GCD's. — hardmath, Sep 05 '14 at 22:51
@hardmath: that comment is worthy of being a proposed answer. — Edward, Sep 06 '14 at 12:54

score 3 · Accepted Answer · edited Apr 13 '17 at 12:21

I've had occasion to mention the symbolic math software Sage in a few answers, e.g about irreducibility over the binary field and factoring of rational polynomials. A good way to do off-the-cuff experiments is with an online Sage account, which has evolved into SageMathCloud.

For the example given in the Question, I made this online Sage workfile (create an account and a new project, and it will display an editable empty workfile):

P.<x> = GF(2)[ ]
f = x^30 + x^11 + x^10 + x^8 + x^7 + x^6 + x^3 + x^2
f.factor()

Running the project (leftmost icon) produces this output:

(x + 1) * x^2 * (x^3 + x + 1) * (x^4 + x^3 + x^2 + x + 1) * (x^5 + x^4 + x^3 + x + 1) * (x^15 + x^14 + x^13 + x^12 + x^4 + x^3 + x^2 + x + 1)

or in more concise (MathJax) form:

$$ (x + 1) * x^2 * (x^3 + x + 1) * (x^4 + x^3 + x^2 + x + 1) * (x^5 + x^4 + x^3 + x + 1) * (x^{15} + x^{14} + x^{13} + x^{12} + x^4 + x^3 + x^2 + x + 1) $$

Note that the "standard 16-bit CCITT polynomial" $G(x) = x^{16}+x^{12}+x^5+1$ cited in the Question is actually the product of the first and last of these irreducible factors.

I'm also a fan of Victor Shoup's NTL: A Library for doing Number Theory, "a high-performance, portable C++ library providing data structures and algorithms for manipulating signed, arbitrary length integers, and for vectors, matrices, and polynomials over the integers and over finite fields."

Halfway down this page we are told that Sage will use the NTL library for polynomials over the binary field, e.g. for multiplication and GCD's, at least if the command is appropriately framed:

sage: x = PolynomialRing(GF(2), 'x').gen()
sage: f = (x^3 - x + 1)*(x + x^2); f
x^5 + x^4 + x^3 + x
sage: g = (x^3 - x + 1)*(x + 1)
sage: f.gcd(g)
x^4 + x^3 + x^2 + 1

Developer level information is on this Sage documentation page.

Ultimately you may want to build a pipeline for factoring over the binary field that goes directly against NTL, cutting out the Sage middleman, as the bare C++ implementation should eliminate some overhead. Information about building NTL itself with the GF2X library is here.

I've built Sage from source a couple of times (it's a fully automated but time-consuming process). If I get around to doing a similar build from recent NTL source, I'll report my experiences here.

Finally this 2002 paper by von zur Gathen and Gerhard, "Polynomial Factorization over $\mathbb{F}_2$", may strike you as an accessible account of investigations extending Cantor-Zassenhaus.

score 2 · Answer 2 · answered Sep 06 '14 at 14:16

... and now for something completely different from the approaches suggested in the comments.

For the toy example in which the data differs in just one bit, the factorization approach can be suitable because the degree of the difference polynomial is small. In general, the difference might be of much larger degree and factoring might be an inefficient way of solving the problem.

Suggestion: given two polynomials $M(x)$ and $N(x)$ that are known to be multiples of the CRC polynomial $P(x)$, compute the greatest common divisor of $M(x)$ and $N(x)$. "Long division" is easy in $\mathbb F_2[x]$ and the resulting gcd is quite likely to be $P(x)$. If you don't get $P(x)$ (e.g. if you know $\deg P(x)$ and it turns out that $\deg \gcd(M(x), N(x)) > \deg P(x)$ ) straightaway, at least you have (I hope) a much smaller-degree polynomial to try to factor.

One reason I chose to use the XOR approach is that some CRCs are "preconditioned" by inverting the first $n$ bits of $M(x)$ and some are not preconditioned as described in this wikipedia section. The XOR step eliminates the need to differentiate between these. — Edward, Sep 07 '14 at 16:29

Efficiently factoring polynomials over $\Bbb F_2$

2 Answers2