HP Forums

Full Version: qreg() quadratic regression with r
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

I noticed that the HP Prime Statistics 2Var App has got the very useful quadratic regression function, but as I understand it doesn't calculate the correlation coefficient "r" (it just gives NaN).
So, I thought it could be a fun exercise to reimplement the quadratic regression from scratch and including the r calculation as well.

Code:

EXPORT qreg(data)
BEGIN
  LOCAL n := SIZE(data);
  LOCAL n_inv := 1.0 / n;
  LOCAL sigmaX := 0, sigmaY := 0, sigmaX2 := 0, sigmaX3 := 0, sigmaX4 := 0, sigmaXY := 0, sigmaX2Y := 0;

  local i;
  FOR i FROM 1 TO n DO
    local x := data(i,1);
    local y := data(i,2);
    local x2 := x * x;
    local x3 := x2 * x;
    local x4 := x2 * x2;
    local xy := x * y;
    local x2y := x2 * y;
    sigmaX := sigmaX + x;
    sigmaY := sigmaY + y;
    sigmaX2 := sigmaX2 + x2;
    sigmaX3 := sigmaX3 + x3;
    sigmaX4 := sigmaX4 + x4;
    sigmaXY := sigmaXY + xy;
    sigmaX2Y := sigmaX2Y + x2y;
  END;

  local x_m := sigmaX * n_inv;
  local x2_m := sigmaX2 * n_inv;
  local x_m2 := x_m * x_m;
  local y_m := sigmaY * n_inv;

  local Sxx := sigmaX2 * n_inv - x_m2;
  local Sxy := sigmaXY * n_inv - x_m * y_m;
  local Sxx2 := sigmaX3 * n_inv - x_m * x2_m;
  local Sx2x2 := sigmaX4 * n_inv - x2_m * x2_m;
  local Sx2y := sigmaX2Y * n_inv - x2_m * y_m;

  local comm := 1.0 / (Sxx * Sx2x2 - Sxx2 * Sxx2);
  local B := (Sxy * Sx2x2 - Sx2y * Sxx2) * comm;
  local C := (Sx2y * Sxx - Sxy * Sxx2) * comm;
  local A := y_m - B * x_m - C * x2_m;
 
  local sigmaUp := 0;
  local sigmaDown := 0;
  FOR i FROM 1 TO n DO
    local x := data[i,1];
    local y := data[i,2];
    sigmaUp := sigmaUp + (y - (A + B * x + C * x * x))^2;
    sigmaDown := sigmaDown + (y - y_m)^2;
  END;

  local r := sqrt(1.0 - sigmaUp / sigmaDown);

  RETURN {C, B, A, r};

END;

example usage
Code:

L1 := {{1,1}, {2.1, 4}, {3.98, 16.2}}
L2 := qreg(L1)

The returned list gives the coefficient for a parabola with the formula f(x) = A + Bx + Cx^2. The order of the coefficient is chosen to be compatible with the built-in function "polynomial_regression".
The fourth element in the list L2(4) is the correlation coefficient.

You can compare the built-in polynomial_regression(L1, 2) and qreg(L1) and compare the results. From a few, totally non scientific, test that I made the errors seem to be around 1e-10 on average.

Sources:
Formula used http://keisan.casio.com/exec/system/14059932254941

I believe that the coefficient are correct at least to 8 significant digits, but I'm not 100% sure.

Here it is another implementation I did in Python, to check if the algorithm was correct (I find it easier to do a prototype in Python and then translate it to HPPPL later).
Code:
import math

# The data in pairs (mm, GB)
data = [
    (45, 0),
    (90, 2.39),
    (117, 4.7),
    (58, 0.34),
    (97, 2.68),
    (114, 4.18),
    (106, 3.41),
    (86, 1.93)
]

# count all the sums
n = len(data)
n_inv = 1.0 / n
sigmaX = sigmaY = sigmaX2 = sigmaX3 = sigmaX4 = sigmaXY = sigmaX2Y = 0
for x,y in data:
    x2 = x * x
    x3 = x2 * x
    x4 = x2 * x2
    xy = x * y
    x2y = x2 * y
    sigmaX += x
    sigmaY += y
    sigmaX2 += x2
    sigmaX3 += x3
    sigmaX4 += x4
    sigmaXY += xy
    sigmaX2Y += x2y

x_m = sigmaX * n_inv
x2_m = sigmaX2 * n_inv
x_m2 = x_m * x_m
y_m = sigmaY * n_inv

# Calcuate the coefficients
Sxx = sigmaX2 * n_inv - x_m2
Sxy = sigmaXY * n_inv - x_m * y_m
Sxx2 = sigmaX3 * n_inv - x_m * x2_m
Sx2x2 = sigmaX4 * n_inv - x2_m * x2_m
Sx2y = sigmaX2Y * n_inv - x2_m * y_m

# Calculate the parabola y = A + Bx + Cx^2
comm = 1.0 / (Sxx * Sx2x2 - Sxx2 * Sxx2)
B = (Sxy * Sx2x2 - Sx2y * Sxx2) * comm
C = (Sx2y * Sxx - Sxy * Sxx2) * comm
A = y_m - B * x_m - C * x2_m

def f(x):
    return A + B * x + C * x * x

# calculate the correlation coefficient r
sigmaUp = sigmaDown = 0
for x,y in data:
    sigmaUp += (y - f(x)) ** 2
    sigmaDown += (y - y_m) ** 2

r = math.sqrt(1.0 - sigmaUp / sigmaDown)

print('f(x) = {} x^2 + {} x + {}'.format(C, B, A))
print('r = {}'.format(r))

The python version have embedded test data and prints the results. As I said, it's just a quick prototype, not intended for real use, so I didn't bother adding proper argument parsing from the command line.

I hope you find it useful, I plan on translating it to the HP 35S later on, if I have time.

EDIT: about the example data in the python code. It's a list of measured diameters (in millimeters) and sizes (in GB) of the reflective layer of several DVD-R that I've had around my house. As all of you may know, you can easily see the occupied area of a DVD-R, so I thought I would calculate a formula to get an approximation of the space left on a disk just by measuring the diameter of the written (lighter) area, and since the area grows with the square of the diameter, it seemed appropriate to use a quadratic regression fitting for this task.
@Mods: I think I've used the wrong subforum, I'm sorry Sad
Problem is, it isn't really a correlation coefficient but a different measure of error that is not defined for polynomial fits. Thats why I didn't implement anything there because all the literature I studied basically said to do so was wrong...

R^2 is defined, and hence that one is calculated and presented.
I didn't know that, thanks for clarifying!
Anyway, it was a fun exercise for me to learn HPPPL, I must say it's way more similar to modern production languages than the simple calculator stuff we've had in the past.
(12-31-2017 02:22 AM)Tim Wessman Wrote: [ -> ]R^2 is defined, and hence that one is calculated and presented.

How can I access this value outside the statistics App? Does the function polynomial_regression() calculate and store that value somewhere, or should I call another function after the fitting?
(12-31-2017 12:50 PM)TheKaneB Wrote: [ -> ]How can I access this value outside the statistics App? Does the function polynomial_regression() calculate and store that value somewhere, or should I call another function after the fitting?

try Vars -> App -> Statistics 2var -> Results -> Corr and CoeffDet

HNY!
That CAS function does not calculate that to my knowledge - there may be some other way or command, but I'm not aware of it. Only by using the app functionality with the variables containing all the results be calculated and stored.
(12-31-2017 01:28 PM)salvomic Wrote: [ -> ]
(12-31-2017 12:50 PM)TheKaneB Wrote: [ -> ]How can I access this value outside the statistics App? Does the function polynomial_regression() calculate and store that value somewhere, or should I call another function after the fitting?

try Vars -> App -> Statistics 2var -> Results -> Corr and CoeffDet

HNY!

These are the values from the previous calculation inside the Statistics App, they aren't updated once I call manually "polynomial_regression" with a different data set.


Tim Wessman Wrote:That CAS function does not calculate that to my knowledge - there may be some other way or command, but I'm not aware of it. Only by using the app functionality with the variables containing all the results be calculated and stored.
All right, thanks.
Reference URL's