Linear least squares regression of polynomial coefficients

by Trent Guidry 1. August 2009 05:22
This article expands on the previous post about Linear and multiple linear least squares regression in C# and gives examples of linear least squares regression of polynomial coefficients.
 
For this example, I will use the data shown below. I intentionally picked data that is difficult to regress a polynomial to because it would be kind of boring to regress a second order or higher polynomial to data that is pretty much linear.
2.3601
133.32
2.3942
666.61
2.4098
1333.22
2.4268
2666.45
2.4443
5332.90
2.4552
7999.35
2.4689
13332.25
2.4885
26664.50
2.5093
53329.00
2.5287
101325.09
For a first order polynomial (a line), the equation is:
Y = A + BX
For this equation, z0 is 1 and z1 is X, and y is Y.
This is done in the code shown below.
            double[] x = new double[] { 2.3601, 2.3942, 2.4098, 2.4268, 2.4443, 2.4552, 2.4689, 2.4885, 2.5093, 2.5287 };
            double[] y = new double[] { 133.322, 666.612, 1333.22, 2666.45, 5332.9, 7999.35, 13332.2, 26664.5, 53329, 101325 };
            int nPolyOrder = 1;
            double[,] z = new double[y.Length, nPolyOrder + 1];
            for (int i = 0; i < y.Length; i++)
            {
                z[i, 0] = 1.0;
                z[i, 1] = x[i];
            }
            double[] coefs = Polynomial.Regress(z, y);
This gives A = -1212289, B = 503788.92 and the data below.
X
Y
Fitted
2.3601
133.32
-23286.98
2.3942
666.61
-6123.37
2.4098
1333.22
1765.81
2.4268
2666.45
10283.15
2.4443
5332.90
19111.49
2.4552
7999.35
24626.26
2.4689
13332.25
31497.13
2.4885
26664.50
41379.54
2.5093
53329.00
51853.08
2.5287
101325.09
61653.87
 
Regression 1st order polynomial
For a second order polynomial, the equation is:
Y = A + BX + CX2
For this equation, z0 is 1, z1 is X, z2 is X2, and y is Y.
This is done in the code shown below.
            double[] x = new double[] { 2.3601, 2.3942, 2.4098, 2.4268, 2.4443, 2.4552, 2.4689, 2.4885, 2.5093, 2.5287 };
            double[] y = new double[] { 133.322, 666.612, 1333.22, 2666.45, 5332.9, 7999.35, 13332.2, 26664.5, 53329, 101325 };
            int nPolyOrder = 2;
            double[,] z = new double[y.Length, nPolyOrder + 1];
            for (int i = 0; i < y.Length; i++)
            {
                z[i, 0] = 1.0;
                z[i, 1] = x[i];
                z[i, 2] = x[i] * x[i];
            }
            double[] coefs = Polynomial.Regress(z, y);
This gives A = 36759078.32, B = -30552595.42, C = 6347521.4 and the data below.
X
Y
Fitted
2.3601
133.32
8037.42
2.3942
666.61
-4722.08
2.4098
1333.22
-5643.88
2.4268
2666.45
-3144.23
2.4443
5332.90
3276.49
2.4552
7999.35
9265.54
2.4689
13332.25
18855.64
2.4885
26664.50
36789.79
2.5093
53329.00
61128.71
2.5287
101325.09
88873.81
Regression 2nd order polynomial
For a third order polynomial, the equation is:
Y = A + BX + CX2 + DX3
For this equation, z0 is 1, z1 is X, z2 is X2, z3 is X3, and y is Y.
This is done in the code shown below.
            double[] x = new double[] { 2.3601, 2.3942, 2.4098, 2.4268, 2.4443, 2.4552, 2.4689, 2.4885, 2.5093, 2.5287 };
            double[] y = new double[] { 133.322, 666.612, 1333.22, 2666.45, 5332.9, 7999.35, 13332.2, 26664.5, 53329, 101325 };
            int nPolyOrder = 3;
            double[,] z = new double[y.Length, nPolyOrder + 1];
            for (int i = 0; i < y.Length; i++)
            {
                z[i, 0] = 1.0;
                z[i, 1] = x[i];
                z[i, 2] = x[i] * x[i];
                z[i, 3] = x[i] * x[i] * x[i];
            }
            double[] coefs = Polynomial.Regress(z, y);
This gives A = -830167848.92, B = 1034157557.04, C = -429397609.64, D = 59427310.55, and the data below.
X
Y
Fitted
2.3601
133.32
-1114.68
2.3942
666.61
3407.84
2.4098
1333.22
2643.43
2.4268
2666.45
1988.25
2.4443
5332.90
3291.59
2.4552
7999.35
5970.15
2.4689
13332.25
12152.31
2.4885
26664.50
28292.46
2.5093
53329.00
57429.76
2.5287
101325.09
98694.28
Regression 3rd order polynomial
For a fourth order polynomial, the equation is:
Y = A + BX + CX2 + DX3 + EX4
For this equation, z0 is 1, z1 is X, z2 is X2, z3 is X3, z4 is X4, and y is Y.
This is done in the code shown below.
            double[] x = new double[] { 2.3601, 2.3942, 2.4098, 2.4268, 2.4443, 2.4552, 2.4689, 2.4885, 2.5093, 2.5287 };
            double[] y = new double[] { 133.322, 666.612, 1333.22, 2666.45, 5332.9, 7999.35, 13332.2, 26664.5, 53329, 101325 };
            int nPolyOrder = 4;
            double[,] z = new double[y.Length, nPolyOrder + 1];
            for (int i = 0; i < y.Length; i++)
            {
                z[i, 0] = 1.0;
                z[i, 1] = x[i];
                z[i, 2] = x[i] * x[i];
                z[i, 3] = x[i] * x[i] * x[i];
                z[i, 4] = x[i] * x[i] * x[i] * x[i];
            }
            double[] coefs = Polynomial.Regress(z, y);
This gives A = 9470307327.9, B = -15832920218.37, C = 9925973640.82, D = -2765589751.22, E = 288948184.43, and the data below.
X
Y
Fitted
2.3601
133.32
-255.30
2.3942
666.61
1289.77
2.4098
1333.22
1932.21
2.4268
2666.45
2786.55
2.4443
5332.90
4755.05
2.4552
7999.35
7217.83
2.4689
13332.25
12526.31
2.4885
26664.50
26899.26
2.5093
53329.00
55407.42
2.5287
101325.09
100200.45
Regression 4th order polynomial
 
 Polynomial regression Jacobian.
Polynomial regression Jacobian

Comments (7) -

xyz United States
8/1/2009 11:00:53 AM #

Hi one more question
when i modified your code
for (int i = 0; i < dY.Length; i++)
            {
                dZ[i, 0] = 1.0;
                dZ[i, 1] = dX[i];
                dZ[i, 2] = dX[i] * dX[i];
                dZ[i, 3] = dX[i] * dX[i] * dX[i];
            }
to
for (int i = 0; i < dY.Length; i++)
            {
for (int j = 0; i <= nPolyOrder ; i++)
{
                dZ[i, j] = dX[i]^j
}           }

I get the same results until poly order =3 it dosent match your results when order = 4 or higher i have tested the same with excel and the results are same. When order of poly is greater 3 the results wont match. could you please shed some light on this. I was also wondering if I could get the R2 value.

Thanks

Reply

Trent United States
8/1/2009 4:59:23 PM #

For

dX[i]^j

I am not really sure how that is working.  I am guessing that you are actually using Math.Pow.

I ran this though and I can reproduce the differences.

            for (int i = 0; i < dY.Length; i++)
            {
                for (int j = 0; j < nPolyOrder + 1; j++)
                {
                    dZ[i, j] = Math.Pow(dX[i], (double)j);
                }
            }

The Math.Pow function isn’t returning exactly the same results as multiplying the variables out.  For the case of 4th order, I am getting.

i, j, multiply them out, Math.Pow
3,2, 5.88935824,5.8893582400000009
3,4, 34.6845404790559,34.684540479055904
5,3,14.799962884608002,14.799962884608

Even though these differences are small, it appears that with higher orders, floating point round off error starts to become a serious problem.

This may also be a function of the data that I used to regress the functions.  That data really does not regress well to polynomials.  I picked it so that I could see a visible difference between the third and fourth order polynomials when I graphed them.  

I would not be surprised if Excel is also running into floating point issues with this dataset as well.

Reply

xyz United States
8/2/2009 6:18:13 PM #

yes thats Math.Pow i was using ^ in VB.Net. I agree with your comments. Thanks for the articles they were great.

Reply

javacasm Spain
12/9/2009 6:29:34 PM #

How could I get the quality of the regression?

Reply

Chris Canada
3/30/2010 11:07:49 AM #

Awesome!  Thanks just what the Dr. ordered!

Reply

Alejandro Valdes Mexico
5/10/2010 11:30:57 AM #

Great post! Thanks

Reply

CBrauer United States
8/12/2011 7:49:40 AM #

Hi,

I'm trying to fit a fourth order polynoimial to the NASDQ Composite Index.  I have differentiated the polynomial to obtain the slope of the curve.  My code is:

        int nPolyOrder = 4;
        double[,] z = new double[y.Length, nPolyOrder + 1];
        for (int k=0; k<ntrades; k++) {
          x[k] = tradeTime[k]/1000 - MarketHours.MarketOpenTime();
          y[k] = (double) last[k];
          z[k, 0] = 1.0;
          z[k, 1] = x[k];
          z[k, 2] = x[k]*x[k];
          z[k, 3] = x[k]*x[k]*x[k];
          z[k, 4] = x[k]*x[k]*x[k]*x[k];
        }

        double[] c = Polynomial.Regress(z, y);
        // Y = A + BX + C*X*X + D*X*X*X + E*X*X*X*X

        for (int k=0; k<ntrades; k++) {
          double X = x[k];
          x_fit[k] = X;
          y_fit[k] = c[0] + c[1]*X + c[2]*X*X + c[3]*X*X*X + c[4]*X*X*X*X;
          y_slope[k] = c[1] + 2*c[2]*X + 3*c[3]*X*X + 4*c[4]*X*X*X;
        }

My question is:  Is this the correct way to obtain the slope of the curve?

Any suggestion will be greatly appreciated.

Charles

Reply

Add comment




  Country flag
biuquote
  • Comment
  • Preview
Loading