Solving Real Polynomial Equations Numerically

Note: At the bottom of every page, there also is a link to a full implementation of this algorithm.

Introduction

Solving equations numerically is a very common problem. I will discuss and implement an algorithm to solve real polynomial equations, whereas you can fix the index of the solution.

The focus of this article lies on a sample implementation and possible problems implementing the algorithm given by Sturm rather than on details about the mathmatical background.

Polynomial Equations

It often turns out that the solution of a problem demands solving a polynomial equation. Sometimes, this equation is given directly by the problem; sometimes it is the result of trying to solve the problem by the help of an algebra tool (you know those "RootOf" expressions produced by Maple). Anyway, it may be necessary to compute a solution of an equation of the form
p1(x) = p2(x)
where p1 and p2 are real univariate polynomials (if you are lucky). As this equation is equivalent to the equation
p1(x) - p2(x) = 0
you can reduce this problem to finding zeroes of a polynomial.

One algorithm is the bisection procedure based on Sturm's theorem, which is presented here. It takes a necessarily square-free (explained later) polynomial, an interval, where to look for the zero, and the index of the zero you are searching (counted from the lower bound of the interval on) and provides you with an approximation of the zero.

Sturm Chains

A polynomial p is called "square-free", if there is no square, which divides it. This implies that there are also no factors of higher order. This again implies that every zero has multiplicity 1.

As already stated, the algorithm you are dealing with assumes that the polynomial is square-free. This isn't really a limitation because every polynomial can be factorized to square-free polynomials (if you find out that the square b*b divides your polynomial, just divide your polynomial by b and perform this iteratively) and finding zeroes of factors means finding zeroes of the original polynomial. But as non-square-free polynomials occur pretty improbably, I don't deal with it here.

Given a square-free polynomial q_0, a Sturm chain is the inverse (!) sequence p of intermediary results when applying Euclid's algorithm (iterative modulo-division) to q_0 and its derivative q_1 = q_0'.

An example: Given the polynomial p = 2*X*X + 3*X - 7, you obtain:
q_0 = X*X*X + X*X + -2
q_1 = q_0' = 3*X*X + 2*X
q_2 = -mod(q_0, q_1) = (2/9) * X + 2
q_3 = -mod(q_1, q_2) = -225
So the inverse sequence p = (q_3, q_2, q_1, q_0) is a Sturm chain.

Sturm's Theorem

So, what is all this about? Sturm's theorem tells you that you can use Sturm chains to count zeroes of the given polynomial q_0. It says that if w(x) is the number of sign changes in the Sturm chain evaluated at a given point x, the number of zeroes of q_0 in the interval ]a,b] is w(a)-w(b).

If you are able to count sign changes in a Sturm chain (and it is relatively easy to do so), you are able to determine the number of zeroes in every given interval and can enclose certain zeroes by controlled downsizing of an interval.

Because the polynomials are able to become zero instead of being just positive or negative and recursion is not always very transparent, the mathmatical definition of the "number of sign changes"-function w is defined a bit complicated:

But already, the representation in pseudo-code using a for-loop looks much more attractive:

signChanges = 0;
lastNonZero = 0;
// run through the array
for (i = 1; i < n; i++) {
   if (p[i](x) != 0) {
      // compare the sign to the last non-zero sign
      if (p[lastNonZero](x)*p[i](x) < 0) {
         // sign change found: count up
         signChanges++;
      }
      lastNonZero = i;
   }
}
return signChanges;

Bisection algorithm

Now, because you are able to determine how many zeroes there are in an interval, you are in a state where it is possible for you to enclose a zero with a given index. Starting with a user-defined interval, you split it iteratively into two equal-sized parts and decide by the help of the w-function applied to the border of the intervals, in which half you can find your zero.

Let n be the index of the wanted zero, ref the reference point (where to start counting the zeroes) and ]lower, upper] the current interval, you have to check whether w(ref) - w(middle) >= n. In the first case (which means that there are at least n zeroes left of middle), you continue with the interval ]lower,middle], in the second case with the interval ]middle, upper].

The algorithm stops after a given number of iterations or when it reaches machine precision.

Note: In fact, the bisection algorithm works under even looser conditions for the Sturm chain, namely the axioms for "Generalized Sturm chains". But, as your Sturm chains fulfill those conditions and you know how to constuct them, you don't need to waste much thought about the more general concept.

Solving Real Polynomial Equations Numerically

Programming Polynomials

Determining the requirements

If you inspect the algorithm above regarding which operations are needed, you find out that your polynomial implementations must be capable of the following operations:

Determining the Sturm chain:

  • For determining P1: Symbolic derivation
  • For determining P2,...,Pn: "mod"; in other words, calculating the remainder of a polynomial division
  • For determining P2,...,Pn: Sign change; in other words, Multiplication of polynomials by -1 (more general: Multiplication by scalars)
  • For determining n: Checking whether a polynomial is constant (more general: Determination of the degree of a polynomial)

For the bisection algorithm, you need the following two operations:

  • Deciding whether two polynomials have different signs at a given point
  • Determining whether a polynomial evaluates to zero at a given point

Both operations easily can be deduced from an evaluation operation. This tells you the value of the polynomial at a given point.

Polynomial Interface

These demands lead you to the following interfaces:

public interface Function {
   /**
    * Evaluates a function at a given point
    * 
    * @param x
    *    the point at which the function has to be evaluated
    * @return the function value at x
    */
   public double valueAt(double x);
}
public interface Polynomial extends Function {
   /**
    * Returns the coefficients array of the polynomial
    * 
    * @return the coefficients array of the polynomial, where
    *    result[n] is the coefficient before X^n
    */
   public double[] toArray();

   /**
    * Calculates the first derivation of the polynomial
    *
    * @return the derivation
    */
   public Polynomial diff();

   /**
    * Calculates the remainder of the polynomial division of this
    * by other
    * 
    * @param other
    *    the divisor (must not be constant)
    * @return the remainder of the polynomial division
    */
   public Polynomial mod(Polynomial other);

    /**
     * Multiplies the polynomial by a real scalar
     * 
     * @param scalar
     *    the scalar to multiply the polynomial by
     * @return the multiplied polynomial
     */
    public Polynomial multiply(double scalar);

    /**
     * Determines the degree of the polynomial
     * 
     * @return the degree of the polynomial, where the degree of
     *     the zeropolynomial is defined as -1
     */
    public int degree();
}

The distinction between functions in general and polynomials is not necessary for your purposes, but may be useful for future extensions.

Likewise, the toArray() method is not needed definitely, but will be useful if you want to implement different polynomial classes (for example, polynomials of degree less or equal than a given constant). The toArray() method then provides you with a common representation and makes you able to use different polynomial implementations in the same operation (for example, you can calculate the remainder of a Polynomial4 and a general Polynomial).

Implementation

In this tutorial, you confine yourself to polynomials of degree less or equal than 4. So, you need 5 coefficients as attributes of the class and constructors for initially setting them. The toArray() method and the toString() method (used for debugging, not necessary) are also easy to implement.

public class Polynomial4 implements Polynomial {
   private double x0, x1, x2, x3, x4;

   /**
    * Construct a polynomial with given coefficients
    * 
    * @param x0
    *    coefficient before 1
    * @param x1
    *    coefficient before X
    * @param x2
    *    coefficient before X*X
    * @param x3
    *    coefficient before X*X*X
    * @param x4
    *    coefficient before X*X*X*X
    */
   public Polynomial4(double x0, double x1, double x2, double x3,
                      double x4) {
        this.x0 = x0;
        this.x1 = x1;
        this.x2 = x2;
        this.x3 = x3;
        this.x4 = x4;
   }

   /**
    * Construct a polynomial with given coefficients
    * 
    * @param poly
    *    coefficient array, where
    *    poly[n] is the coefficient before X^n for
    *    n = 0, 1, 2, 3 or 4.
    *    The array length may be less than 5, but not greater than 5.
    */
   public Polynomial4(double[] poly) {
        assert poly.length <= 5;

        if (poly.length > 0) {
            x0 = poly[0];
        }
        if (poly.length > 1) {
            x1 = poly[1];
        }
        if (poly.length > 2) {
            x2 = poly[2];
        }
        if (poly.length > 3) {
            x3 = poly[3];
        }
        if (poly.length > 4) {
            x4 = poly[4];
        }
   }

   public double[] toArray() {
      return new double[] { x0, x1, x2, x3, x4 };
   }

   public String toString() {
      return x4 + " * X^4 + " + x3 + " * X^3 + " + x2 + " * X^2
         + " + x1 + " * X + " + x0;
   }
}

To evaluate the polynomial function, you use the Horner scheme, which supplies numerical stability.

   public double valueAt(double x) {
      // evaluation using the Horner scheme
      double value = x4;
      value = x3 + value * x;
      value = x2 + value * x;
      value = x1 + value * x;
      value = x0 + value * x;
      return value;
   }

As firstly the derivation of a4*X*X*X*X + a3*X*X*X + a2*X*X + a1*X + a0 is apparrently 4*a4*X*X*X + 3*a3*X*X + 2*a2*X + a1 (as every primary-school pupil can approve), secondly multiplication of a polynomial by a scalar is defined as multiplication of every coefficient by the scalar and thirdly the degree of a polynomial is the maximum index of a non-zero coefficient, you can implement those three methods:

   // Actually a degree4-result is sufficient (and we want to make
   // our users benefit from this fact, so we define a
   // Polynomial4-diff-method)...
   public Polynomial4 diff4() {
      return new Polynomial4(x1, 2 * x2, 3 * x3, 4 * x4, 0);
   }

   // ...but to satisfy the interface we need also a method, which
   // returns a general polynomial.
   public Polynomial4 diff() {
      return diff4();
   }

   public Polynomial4 multiply(double scalar) {
      return new Polynomial4(scalar * x0, scalar * x1, scalar * x2,
                             scalar * x3, scalar * x4);
   }

   public int degree() {
      if (x4 != 0) {
         return 4;
      } else if (x3 != 0) {
         return 3;
      } else if (x2 != 0) {
         return 2;
      } else if (x1 != 0) {
         return 1;
      } else if (x0 != 0) {
         return 0;
      } else {
         return -1;
      }
   }

The algorithm for calculating the remainder of a polynomial division becomes more difficult. Furthermore, here one cannot make much benefit (in form of efficiency) of the fact that the polynomials have a degree less or equal then 4. So, you implement a more general algorithm.

The remainder of the polynomial division a by b is defined as the polynomial r that has got a degree less than b and results from a by iterative application of polynomial reduction by b.

So first of all, you have to implement the polynomial reduction. The reduction of a by b is defined as:
a' = a - b * (hc(a) / hc(b)) * X^(deg(a)-deg(b))
where hc(p) means the non-zero coefficient of p, which has got the highest index ("highest coefficient").
One can proove that deg(a') < deg(a), which will be needed later.

The implementation of the reduce() method looks like this:

   private double[] reduce(double[] a, int degA, double[] b,
      int degB) {
      int degDiff = degA - degB;

      double[] result = new double[degA];
      for (int i = degA - 1; i >= degDiff; i--) {
         result[i] = a[i] - b[i - degDiff] / b[degB] * a[degA];
      }

      for (int i = 0; i < degDiff; i++) {
         result[i] = a[i];
      }
      return result;
   }

Now, you can easily implement the calculation of the remainder by iteratively applying the reduction function until the degree of a is less then the degree of b:

private double[] mod(double[] a, int degA, double[] b, int degB) {
      if (degB < 1) {              // the illegal case
         throw new IllegalArgumentException(
            "Cannot divide by constant polynomials");
      } else if (degA < degB) {    // the basic case
         return a;
      } else {                     // the recursion case
         // reduce a by b
         double[] result = reduce(a, degA, b, degB);

         // calculate the degree of the result
         int newDeg = degA - 1;
            while (newDeg >= 0 && result[newDeg] == 0) {
               newDeg--;
         }

         // do recursion
         return mod(result, newDeg, b, degB);
      }
   }

Please also note:

  • As you know that polynomial reduction reduces the degree (see above), termination of this algorithm is assured.
  • Polynomial division by a constant polynomial is undefined (IllegalArgumentException)

Because the function above still works on double arrays instead of polynomial objects, you have to introduce one (or two) wrapper method(s):

   // Actually a degree4-result is sufficient (and we want to make
   // our users benefit from this fact, so we define a
   // Polynomial4-mod-method)...
   public Polynomial4 mod(Polynomial4 other) {
      return new Polynomial4(mod(toArray(), degree(),
         other.toArray(), other.degree()));
   }

   // ...but to satisfy the interface we need also a method, which
   // returns a general polynomial.
   public Polynomial mod(Polynomial other) {
      return new Polynomial4(mod(toArray(), degree(),
         other.toArray(), other.degree()));
   }

Solving Real Polynomial Equations Numerically

Implementation of the Bisection Algorithm

By using the available polynomial methods, you now are able to create a Sturm chain from a given polynomial by simply using the definition:

   /**
    * Calculates the Sturm chain to a given polynomial
    * 
    * @param function
    *    the polynomial function
    * @return the Sturm chain of function as array
    */
   private static Polynomial[] calculateSturm(Polynomial function) {
        List<Polynomial> sturm = new LinkedList<Polynomial>();

        // add the original function and its derivation
        sturm.add(0, function);
        sturm.add(0, function.diff());

        // iteratively perform polynomial divison with sign change
        while (sturm.get(0).degree() > 0) {
            sturm.add(0, sturm.get(1).mod(sturm.get(0)).multiply(-1));
        }

        // convert the list to an array for efficiency purposes
        Polynomial[] result = new Polynomial4[sturm.size()];
        int i = 0;
        for (Polynomial poly : sturm) {
            result[i] = poly;
            i++;
        }
        return result;
   }

For simplicity reasons (you don't know the length of the Sturm chain at first), you use a list at first. But for efficiency purposes (we are going to iterate through the Sturm chain very often), you then convert this list to an array.

The implementation of the "w" method used in Sturm's theorem is relatively straightforward using the already given pseudo-code (see page 1):

   /**
    * Sturm's "w" function for counting zeroes
    * 
    * @param sturm
    *    the Sturm chain as array
    * @param x
    *    where to evaluate the "w" function
    * @param precision
    *    tolerance in comparing function values
    * @return the result of the "w" function defined by Sturm
    */
   private static int w(Polynomial[] sturm, double x,
                        double precision) {
      int signChanges = 0;
      int lastNonZero = 0;
      // run through the array
      for (int i = 1; i < sturm.length; i++) {
         if (Math.abs(sturm[i].valueAt(x)) > precision) {
            // compare the sign to the last non-zero sign
            if (sturm[lastNonZero].valueAt(x) * sturm[i].valueAt(x)
               < 0) {
               // sign change found: count up
               signChanges++;
            }
            lastNonZero = i;
         }
      }
      return signChanges;
   }
Implementation note: As you have to live and calculate with the impreciseness of the machine operations, I introduced a tolerance factor precision, which is be used when you check a function for being zero. This should be a very small positive value. Defining it to large or too small will reduce the precision of the result in some cases. More facts on this issue are given in the last section.

So, you have got everything needed for your bisection algorithm, which is not very complicated any more: Take the interval and halve it depending on whether the "w" function tells you that the wanted zero is left or right of the middle. The rest should be self-explanatory:

   /**
    * Search zeroes of a polynomial function by executing a
    * bisection algorithm using Sturm's theorem
    * 
    * @param sturm
    *    the Sturm chain of the function
    * @param num
    *    the number of the wanted zero; counting starts from lower
    * @param lower
    *    lower bound of the interval, in which the zero is searched
    * @param upper
    *    upper bound of the interval, in which the zero is searched
    * @param precision
    *    tolerance in comparing function values
    * @param iterations
    *    maximum number of iterations (the more iterations, the more
    *    precise the result); the algorithm stops before that
    *    maximum number, when it reaches sufficient precision
    *    (machine precision)
    * @return the zero
    */
   private static double bisection(Polynomial[] sturm, int num,
      double lower, double upper, double precision, int iterations) {
      // define the point where to start counting the zeroes
      double t = lower;

      // do the maximum number or iterations (if necessary)
      for (int i = 0; i < iterations; i++) {
         // determine the middle of the interval
         double c = (upper + lower) / 2;

         // Check, if we have already reached machine precision
         if (upper <= lower || c <= lower || c >= upper) {
            return lower;
         }

         // Left or right interval?
         // Are there less than "num" zeroes between t and c?
         if (w(sturm, t, precision) - w(sturm, c, precision) < num) {
            // right
            lower = c;
         } else {
            // left
            upper = c;
         }
      }
      // the wanted zero lies in the intervall [lower, upper],
      // so the middle of this interval might be a good guess
      return (upper + lower) / 2;
   }

Because the user is usually not expected to calculate the Sturm chain itself or give all necessary parameters, you define some wrapper functions to support him/her and place all the stuff into a new class:

public class Solve {
   private static final double FLOATING_POINT_PRECISION = 1e-7;

   // documentation skipped... it's about the same as the one of
   // bisection(...)
   public static double solve(Polynomial poly, int num, double lower,
         double upper, double precision, int iterations) {
      return bisection(calculateSturm(poly), num, lower, upper,
         precision, iterations);
   }

   // documentation skipped... it's about the same as the one of
   // bisection(...)
   public static double solve(Polynomial poly, int num, double lower,
         double upper, int iterations) {
      return bisection(calculateSturm(poly), num, lower, upper,
         FLOATING_POINT_PRECISION, iterations);
   }

   // ... already defined methods ...
}

Finished! Now, to test it...

Solving Real Polynomial Equations Numerically

Tests and Discussion

So, now you can find some zeroes of polynomials!

public class Test {

   public static void main(String[] args) {
      // x^2 - x = 0
      // Solutions: -1 and 1
      testPoly2(new Polynomial4(new double[] { -1, 0, 1 }));

      // x^2 - 2x + 1 = 0
      // Solution: 1
      // As we call the function for 2 solutions, but our polynomial
      // has got only one, the second solution will be nonsense
      testPoly2(new Polynomial4(new double[] { 1, -2, 1 }));

      // x^3 - 10x^2 + 31x - 30
      // Solutions: 2, 3 and 5
      testPoly3(new Polynomial4(new double[] { -30, 31, -10, 1 }));

      // x^4 - 10x^3 + 31x^2 - 30x
      // Solutions: 0, 2, 3 and 5
      testPoly4(new Polynomial4(new double[] { 0, -30, 31, -10, 1 }));

      // x^4 - 13x^3 + 61x^2 - 123x + 90
      // Solutions: 2, 3 and 5
      testPoly3(new Polynomial4(new double[]
         { 90, -123, 61, -13, 1 }));
   }

   // prints out two solutions of the given polynomial
   private static void testPoly2(Polynomial poly) {
      double solution1 = Solve.solve(poly, 1, -10, +10, 200000);
      double solution2 = Solve.solve(poly, 2, -10, +10, 200000);
      System.out.println("Solutions of " + poly + " = 0:\n");
      System.out.println("   " + solution1 + " and " + solution2);
      System.out.println("\n");
   }

   // prints out 3 solutions of the given polynomial
   private static void testPoly3(Polynomial poly) {
       // ...
   }
   
   // prints out 4 solutions of the given polynomial
   private static void testPoly4(Polynomial poly) {
      // ...
   }
}

Executing this code returns:

Solutions of 0.0 * X^4 + 0.0 * X^3 + 1.0 * X^2 + 0.0 * X + -1.0 = 0:

   -1.0000000000000002 and 0.9999999999999999


Solutions of 0.0 * X^4 + 0.0 * X^3 + 1.0 * X^2 + -2.0 * X + 1.0 = 0:

   0.999999992549419 and 9.999999999999998


Solutions of 0.0 * X^4 + 1.0 * X^3 + -10.0 * X^2 + 31.0 * X +
   -30.0 = 0:

   1.999999999999999 and 2.9999999999999982 and 4.999999999999999


Solutions of 1.0 * X^4 + -10.0 * X^3 + 31.0 * X^2 + -30.0 * X +
   0.0 = 0:

   -4.9E-324 and 1.999999999999999 and 2.9999999999999982 and
      4.999999999999999


Solutions of 1.0 * X^4 + -13.0 * X^3 + 61.0 * X^2 + -123.0 * X +
   90.0 = 0:

   1.9999999999999962 and 2.999999921959193 and 4.999999999999995

So, apparantly the algorithm is working and producing good approximations of the zeroes.

What happens in the second test case? you print out two solutions, though your polynomial has got only one. So, the bisection algorithm always knows that the demanded zero is not in the left interval and therefore chooses the right interval. This leads to producing a result near the upper bound 10 of the given interval [-10, 10].

If you consider the last test case, you may notice that this polynomial is not square-free. Nevertheless, the algorithm returns the correct zeroes. So, after all, it seems to be robust against non-square-free polynomials.

Imaginable Modifications/Extensions

  • Performing some considerations about limits should enable you to find zeroes counted from minus infinity instead of a real reference point.
  • As pointed out in the section about Sturm's theorem, one could extend the algorithm to work with non-square-free polynomials, too.
  • A check whether there exists a zero with the given index in the given interval would also be nice (instead of just returning a value near the upper bound of the interval).


Downloads

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: October 29, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you interested in building a cognitive application using the power of IBM Watson? Need a platform that provides speed and ease for rapidly deploying this application? Join Chris Madison, Watson Solution Architect, as he walks through the process of building a Watson powered application on IBM Bluemix. Chris will talk about the new Watson Services just released on IBM bluemix, but more importantly he will do a step by step cognitive …

  • The proliferation of cloud computing options has begun to change the way storage is thought about, procured, and used. IT managers and departments need to think through how cloud options might fit into and complement their onsite data infrastructures. This white paper explains cloud storage and backup, providing advice about the tools and best practices for its implementation and use. Read this white paper for some useful takeaways about how to take advantage of cloud storage for high availability, backup and …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds