Solving Real Polynomial Equations Numerically

Note: At the bottom of every page, there also is a link to a full implementation of this algorithm.

Introduction

Solving equations numerically is a very common problem. I will discuss and implement an algorithm to solve real polynomial equations, whereas you can fix the index of the solution.

The focus of this article lies on a sample implementation and possible problems implementing the algorithm given by Sturm rather than on details about the mathmatical background.

Polynomial Equations

It often turns out that the solution of a problem demands solving a polynomial equation. Sometimes, this equation is given directly by the problem; sometimes it is the result of trying to solve the problem by the help of an algebra tool (you know those “RootOf” expressions produced by Maple). Anyway, it may be necessary to compute a solution of an equation of the form
p1(x) = p2(x)
where p1 and p2 are real univariate polynomials (if you are lucky). As this equation is equivalent to the equation
p1(x) – p2(x) = 0
you can reduce this problem to finding zeroes of a polynomial.

One algorithm is the bisection procedure based on Sturm’s theorem, which is presented here. It takes a necessarily square-free (explained later) polynomial, an interval, where to look for the zero, and the index of the zero you are searching (counted from the lower bound of the interval on) and provides you with an approximation of the zero.

Sturm Chains

A polynomial p is called “square-free”, if there is no square, which divides it. This implies that there are also no factors of higher order. This again implies that every zero has multiplicity 1.

As already stated, the algorithm you are dealing with assumes that the polynomial is square-free. This isn’t really a limitation because every polynomial can be factorized to square-free polynomials (if you find out that the square b*b divides your polynomial, just divide your polynomial by b and perform this iteratively) and finding zeroes of factors means finding zeroes of the original polynomial. But as non-square-free polynomials occur pretty improbably, I don’t deal with it here.

Given a square-free polynomial q_0, a Sturm chain is the inverse (!) sequence p of intermediary results when applying Euclid’s algorithm (iterative modulo-division) to q_0 and its derivative q_1 = q_0‘.

An example: Given the polynomial p = 2*X*X + 3*X – 7, you obtain:
q_0 = X*X*X + X*X + -2
q_1 = q_0′ = 3*X*X + 2*X
q_2 = -mod(q_0, q_1) = (2/9) * X + 2
q_3 = -mod(q_1, q_2) = -225
So the inverse sequence p = (q_3, q_2, q_1, q_0) is a Sturm chain.

Sturm’s Theorem

So, what is all this about? Sturm’s theorem tells you that you can use Sturm chains to count zeroes of the given polynomial q_0. It says that if w(x) is the number of sign changes in the Sturm chain evaluated at a given point x, the number of zeroes of q_0 in the interval ]a,b] is w(a)-w(b).

If you are able to count sign changes in a Sturm chain (and it is relatively easy to do so), you are able to determine the number of zeroes in every given interval and can enclose certain zeroes by controlled downsizing of an interval.

Because the polynomials are able to become zero instead of being just positive or negative and recursion is not always very transparent, the mathmatical definition of the “number of sign changes”-function w is defined a bit complicated:

But already, the representation in pseudo-code using a for-loop looks much more attractive:

signChanges = 0;
lastNonZero = 0;
// run through the array
for (i = 1; i < n; i++) {
   if (p[i](x) != 0) {
      // compare the sign to the last non-zero sign
      if (p[lastNonZero](x)*p[i](x) < 0) {
         // sign change found: count up
         signChanges++;
      }
      lastNonZero = i;
   }
}
return signChanges;

Bisection algorithm

Now, because you are able to determine how many zeroes there are in an interval, you are in a state where it is possible for you to enclose a zero with a given index. Starting with a user-defined interval, you split it iteratively into two equal-sized parts and decide by the help of the w-function applied to the border of the intervals, in which half you can find your zero.

Let n be the index of the wanted zero, ref the reference point (where to start counting the zeroes) and ]lower, upper] the current interval, you have to check whether w(ref) – w(middle) >= n. In the first case (which means that there are at least n zeroes left of middle), you continue with the interval ]lower,middle], in the second case with the interval ]middle, upper].

The algorithm stops after a given number of iterations or when it reaches machine precision.

Note: In fact, the bisection algorithm works under even looser conditions for the Sturm chain, namely the axioms for “Generalized Sturm chains”. But, as your Sturm chains fulfill those conditions and you know how to constuct them, you don’t need to waste much thought about the more general concept.

More by Author

Must Read