In this post I will set up the framework for doing linear and multiple linear regression. Linear regression is a fairly simple way of fitting equation parameters to a set of observed data.
A multiple linear regression can be expressed by the equation below.
y =a0z0 + a1z1 + a2z2 + … + amzm + e
Where y is the output of the regressed function, z0 to zm are different variables or functions, and e is the residual error of the regression equation at that point.
This can be written in matrix form as
Y = ZA + E
Where Y is the vector of observed y points, A is the vector of coefficients, E is the vector of errors and Z is the matrix of functional values.
The square residual error Sr is calculated by rearranging and then squaring E as shown below.
Sr = E2 = (Y-ZA)2 = (Y-ZA)T(Y-ZA) = YTY + AT(ZTZ)A – 2YTZA
At the minimum square residual error, the derivatives of SR with respect to the unknowns A are equal to zero.
dSR/dA = 2ZTZA – 2ZTY = 0
This gives the equation shown below.
ZTZA = ZTY
To solve this equation, I will add a static function to the previously developed Polynomial class called Regress. The code for this function is shown below.
public static double[] Regress(
double[,] z,
double[] y)
{
//y=a0 z1 + a1 z1 +a2 z2 + a3 z3 +...
//Z is the functional values.
//Z index 0 is a row, the variables go across index 1.
//Y is the summed value.
//returns the coefficients.
Debug.Assert(z != null && y != null);
Debug.Assert(z.GetLength(0) == y.GetLength(0));
Matrix zMatrix = z;
Matrix zTransposeMatrix = zMatrix.Transpose();
Matrix leftHandSide = zTransposeMatrix * zMatrix;
Matrix rightHandSide = zTransposeMatrix * y;
Matrix coefsMatrix = leftHandSide.SolveFor(rightHandSide);
return coefsMatrix;
}
To give an example of using this function, I will regress the parameters of the Clausisus Clapeyron correlation to the vapor pressure data of methanol that I got from the 6th edition of Perry's Chemical Engineers' Handbook.
The Clausisus Clapeyron equation is given as shown below.
ln Psat = A – B/T
The data that I will use to regress A and B is shown below.
|
T K
|
P Pa
|
|
229.15
|
133.32249
|
|
247.85
|
666.61245
|
|
256.95
|
1333.2249
|
|
267.15
|
2666.4498
|
|
278.15
|
5332.8996
|
|
285.25
|
7999.3494
|
|
294.35
|
13332.249
|
|
307.95
|
26664.498
|
|
323.05
|
53328.996
|
|
337.85
|
101325.09
|
Where T is the temperature in Kelvin and P is the pressure in Pascals.
To regress these parameters, I will need to set up the Z matrix. The z functions are the functions that follow the regression parameters. For this case, z0 is simply 1 and z1 is -1/T. I also need y, which is the natural log of the pressure.
This regression is solved with the code shown below.
double[] temperatures =
new double[] { 229.15, 247.85, 256.95, 267.15, 278.15, 285.25, 294.35, 307.95, 323.05, 337.85 };
double[] pressures =
new double[] { 133.3224898, 666.6124489, 1333.224898, 2666.449795, 5332.899591, 7999.349386, 13332.24898, 26664.49795, 53328.99591, 101325.0922 };
Func<double, double>[] zFunctions = new Func<double, double>[2];
zFunctions[0] = tempeature => 1.0;
zFunctions[1] = tempeature => -1.0 / tempeature;
Func<double, double> yFunction = pressure => Math.Log(pressure);
double[] yValues = new double[temperatures.Length];
double[,] zValues = new double[temperatures.Length, zFunctions.Length];
Parallel.For(0, temperatures.Length, i =>
{
yValues[i] = yFunction(pressures[i]);
zValues[i, 0] = zFunctions[0](temperatures[i]);
zValues[i, 1] = zFunctions[1](temperatures[i]);
}
);
double[] dCoefs = Polynomial.Regress(zValues, yValues);
Running this gives values of 25.46 for A and 4701.0 for B.
A comparison of the values calculated by the Clausisus Clapeyron equation using the regressed parameters verse the actual data values are shown in the table below.
|
T K
|
P Pa
|
Calculated
|
|
229.15
|
133.32249
|
140.5453202
|
|
247.85
|
666.61245
|
660.7495631
|
|
256.95
|
1333.2249
|
1293.504712
|
|
267.15
|
2666.4498
|
2600.997163
|
|
278.15
|
5332.8996
|
5216.386216
|
|
285.25
|
7999.3494
|
7944.511893
|
|
294.35
|
13332.249
|
13223.3454
|
|
307.95
|
26664.498
|
26770.72276
|
|
323.05
|
53328.996
|
54644.8107
|
|
337.85
|
101325.09
|
103371.4208
|