In this article I will set up the framework for doing linear and multiple linear least squares regression. Linear regression is a fairly simple way of fitting equation parameters to a set of observed data.
A multiple linear regression can be expressed by the equation below.
y = p0x0 + p1x1 + p2x2 + … + pmxm + e
Where y is the output of the regressed function, x0 to xm are different variables or functions, and e is the residual error of the regression equation at that point.
This can be written in matrix form as
Y = JP + E
n is the number of parameters. m is the number of data points.

Where Y is the vector of observed y points, A is the vector of coefficients, E is the vector of errors and Z is the matrix of functional values.
The square residual error Sr is calculated by rearranging and then squaring E as shown below.
Sr = E2 = (Y-JP)2 = (Y-JP)T(Y-JP) = YTY + PT(JTJ)P – 2YTJP
At the minimum square residual error, the derivatives of SR with respect to the unknowns A are equal to zero.
dSR/dP = 2JTJP – 2JTY = 0
This gives the equation shown below.
JTJP = JTY
To solve this equation, I will add a static function to the previously developed Polynomial class called Regress. The code for this function is shown below.
public static double[] Regress(double[,] z, double[] y)
{
//y=a0 z1 + a1 z1 +a2 z2 + a3 z3 +...
//Z is the functional values.
//Z index 0 is a row, the variables go across index 1.
//Y is the summed value.
//returns the coefficients.
Debug.Assert(z != null && y != null);
Debug.Assert(z.GetLength(0) == y.GetLength(0));
Matrix zMatrix = z;
Matrix zTransposeMatrix = zMatrix.Transpose();
Matrix leftHandSide = zTransposeMatrix * zMatrix;
Matrix rightHandSide = zTransposeMatrix * y;
Matrix coefsMatrix = leftHandSide.SolveFor(rightHandSide);
return coefsMatrix;
}
To give an example of using this function, I will regress the parameters of the Clausisus Clapeyron correlation to the vapor pressure data of methanol that I got from the 6th edition of Perry's Chemical Engineers' Handbook.
The Clausisus Clapeyron equation is given as shown below.
ln Psat = A – B/T
The data that I will use to regress A and B is shown below.
|
T K
|
P Pa
|
|
229.15
|
133.32249
|
|
247.85
|
666.61245
|
|
256.95
|
1333.2249
|
|
267.15
|
2666.4498
|
|
278.15
|
5332.8996
|
|
285.25
|
7999.3494
|
|
294.35
|
13332.249
|
|
307.95
|
26664.498
|
|
323.05
|
53328.996
|
|
337.85
|
101325.09
|
Where T is the temperature in Kelvin and P is the pressure in Pascals.
To regress these parameters, I will need to set up the Z matrix. The z functions are the functions that follow the regression parameters. For this case, z0 is simply 1 and z1 is -1/T. I also need y, which is the natural log of the pressure.
This regression is solved with the code shown below.
double[] temperatures = new double[] { 229.15, 247.85, 256.95, 267.15, 278.15, 285.25, 294.35, 307.95, 323.05, 337.85 };
double[] pressures = new double[] { 133.3224898, 666.6124489, 1333.224898, 2666.449795, 5332.899591, 7999.349386, 13332.24898, 26664.49795, 53328.99591, 101325.0922 };
Func<double, double>[] zFunctions = new Func<double, double>[2];
zFunctions[0] = tempeature => 1.0;
zFunctions[1] = tempeature => -1.0 / tempeature;
Func<double, double> yFunction = pressure => Math.Log(pressure);
double[] yValues = new double[temperatures.Length];
double[,] zValues = new double[temperatures.Length, zFunctions.Length];
Parallel.For(0, temperatures.Length, i =>
{
yValues[i] = yFunction(pressures[i]);
zValues[i, 0] = zFunctions[0](temperatures[i]);
zValues[i, 1] = zFunctions[1](temperatures[i]);
}
);
double[] dCoefs = Polynomial.Regress(zValues, yValues);
Running this gives values of 25.46 for A and 4701.0 for B.
A comparison of the values calculated by the Clausisus Clapeyron equation using the regressed parameters verse the actual data values are shown in the table below.
|
T K
|
P Pa
|
Calculated
|
|
229.15
|
133.32249
|
140.5453202
|
|
247.85
|
666.61245
|
660.7495631
|
|
256.95
|
1333.2249
|
1293.504712
|
|
267.15
|
2666.4498
|
2600.997163
|
|
278.15
|
5332.8996
|
5216.386216
|
|
285.25
|
7999.3494
|
7944.511893
|
|
294.35
|
13332.249
|
13223.3454
|
|
307.95
|
26664.498
|
26770.72276
|
|
323.05
|
53328.996
|
54644.8107
|
|
337.85
|
101325.09
|
103371.4208
|

Subscribe

can we use the above mentioned function for 2nd order polynomial fit
Yes.
That's a good question. I will do a post on regressing polynomials. Hopefully I will have time to get it out in the next couple of days.
I finished the post. It is at www.trentfguidry.net/.../...mial-Coefficients.aspx
It gives examples of fitting 1st, 2nd, 3rd, and 4th order poylnomials.
Thanks for the article it was great.
It's worth noting that there are regression procedures beside least squares. One good example is L1 regression (also called "least absolute errors regression"), which I wrote about here:
matlabdatamining.blogspot.com/.../...gression.html
Thanks for this excellent article. I'm curios what algorithm you used for you linear system solver ("SolveFor")? I'm kind of getting problems when trying to solve for higher-order approximations and I think it's related to my solver (which is based on Numerical Recipes).
Never mind I found your other excellent posts (e.g. www.trentfguidry.net/.../...-LU-decomposition.aspx) and I must say I'm very impressed with the stability of your LU decomposition. Well done!
Thanks.
My first guess on why the LU decomposition that I am using is more stable than the one that you are referring to is the difference in pivot rules.
That is just a guess, though, since I haven't looked at the code that you are referring to.
Where can I find a Visual Studio 2005 or 2008 version of this code? I am despairingly needing a multiple linear regression function code. Thanks.
Hi Alexandre,
I don't have versions that run on those older frameworks anymore, but, I think, the biggest change that you would need to make to get it to run on those frameworks would to remove the parallel processing and the TPL parts. The Parallel.For's would need to be replaced with traditional c# for loops.