Threaded Mode | Linear Mode

Han · (This post was last modified: 01-22-2014 02:55 PM by Han.)

Is the current implementation merely a linear regression of something similar to \( \mathrm{logit}(P) = \alpha + \beta x \) where \( \mathrm{logit}(P) = \ln( \frac{P}{1-P}) \)? I was naively thinking about taking the min and max value of \( P \) and normalize it to between 0+0.0000001 and 1-0.0000001 using a linear function (so that there are no issues with \( \mathrm{logit}(P) \), doing a linear regression, and then taking the inverse of the normalizing function. I take it I'm forgetting something quite obvious...

Here's my naive approach in code (for data that is central around the origin).

Code:

// nl is the normalized list of y-values

// delta is merely "cutoff" so that all y-values are normalized to within

// the interval [delta, 1-delta] since we cannot use the interval [0,1]

// the returned string is a formula representing the logistic fit of the form

// L/(1+exp(-Ax-B)) + C; unless the data is bad, I think C is generally small

export logreg(xlist,ylist)

begin

  local ymin,ymax,n,nl;

  local delta:=.00000001,m;

  local logit;

  local lr;

  local f;

  ymin:=MIN(ylist);

  ymax:=MAX(ylist);

  m:=(1-2*delta)/(ymax-ymin);

  n:=SIZE(ylist);

  nl:=makelist( m*(ylist(X)-ymin)+delta,X,1,n );

  logit:=makelist( ln(nl(X)/(1-nl(X))),X,1,n);

  lr:=linear_regression(xlist,logit);

  // more "accurate" would be to use m*ymin-delta as opposed to m*ymin

  f:="" + 1/m + "/(1+e^(-(" + lr(1) + "*X+" + lr(2) + ")))+" + m*ymin;

  return(f);

end;

At the home screen:

Code:

L0:=makelist(X,X,-10,10,.1);

L1:=makelist(1/(1+e^(-X))+RANDOM()*.2,X,-10,10,.1);

logreg(L0,L1);

In the 2-vars Stats app, press [Num] and select C0 (and then C1, and C2) and press "Make"

C0: Expression: L0(X), X starts from 1 to 201 step 1
C1: Expression: L1(X), X starts from 1 to 201 step 1
C2: Expression: use formula given by logreg(L0,L1), X starts from -10 to 10 step .1

Hit [Plot] and ignore the error message. Change your plot settings accordingly. Here's a screenshot:

A smarter algorithm with check the \( R^2 \) value of the linear regression to see if outliers need to be filtered. Perhaps there may even be a preference for the points closer to the origin after normalization since \( \ln (\frac{P}{1-P}) \) grows large for \( P \) values close to 0 and 1. Or perhaps do two linear regressions (one favoring points near the origin) and compare the \( R^2 \) values, and choose the tighter fit.

Here's the linear regression of \( \ln (\frac{P}{1-P}) \) after \( P \) has be normalized in the example above.

Edit: this doesn't work for domains not centered about the origin.