Logistic Fit - Printable Version

Logistic Fit - Printable Version

+- HP Forums (https://www.hpmuseum.org/forum)
+-- Forum: HP Calculators (and very old HP Computers) (/forum-3.html)
+--- Forum: HP Prime (/forum-5.html)
+--- Thread: Logistic Fit (/thread-456.html)

Logistic Fit - jgreenb2 - 01-17-2014 11:43 PM

I have been stymied trying to get the Prime to perform a Logistic fit in the Statistics 2Var app. Whenever I attempt to choose "Logistic" in the Symb view and then press Plot, I get Error: Invalid Input. Other function choices do fit the data (albiet badly). Anyone else seen this behavior?

RE: Logistic Fit - Tim Wessman - 01-18-2014 02:21 PM

Please post the data you are trying to fit. That fit will misbehave very badly if you don't have data that is quite well behaved.

RE: Logistic Fit - jgreenb2 - 01-18-2014 04:14 PM

Tim,

The data was generated by adding random noise to a sigmoid function:

c1:= -10:0.1:10;
c2:= sigmoid(c1)+random()*0.2;

where:

sigmoid(x):= 1/(1+exp(-x))

RE: Logistic Fit - Helge Gabert - 01-19-2014 09:48 PM

Are you aware that your sigmoid function has a pole x=0? So it doesn't seem to make much sense to fit from -10 to 10.

Even so, I tried to fit from 3 to 10 in 0.1 steps with a random noise added, and that also won't work with the logistic fit (with fixed L, A, B), so it seems to me that this problem could be investigated by HP.

RE: Logistic Fit - Thomas Klemm - 01-19-2014 10:12 PM

(01-19-2014 09:48 PM)Helge Gabert Wrote: Are you aware that your sigmoid function has a pole x=0? So it doesn't seem to make much sense to fit from -10 to 10.

{o,o}
|)__)
-”-”-
O RLY?

RE: Logistic Fit - Helge Gabert - 01-20-2014 05:44 AM

Oops! I inadvertently entered 1/(1-e^(-x)),, which is discontinuous at x=0! Of course, 1/(1+e^(-x)) is not, and behaves as shown. Thanks for pointing that out.

RE: Logistic Fit - jgreenb2 - 01-20-2014 01:52 PM

You will find, however, that it still doesn't fit.

RE: Logistic Fit - Helge Gabert - 01-20-2014 03:01 PM

I already found that out Smile

Although, for certain subsets of the x range, no problem, e.g. 1 .. 5 with 0.1 step.

RE: Logistic Fit - jgreenb2 - 01-20-2014 06:44 PM

Interesting...I hadn't noticed that.

Still, it seems that if it could fit any data reliably this would be the easiest set. Hopefully Tim can add it to the list of things to investigate for a future release.

RE: Logistic Fit - Tugdual - 01-21-2014 06:43 AM

I suggest you prefix your thread title with "[BUG]"
Would be good to know if there will be a firmware update and what is in scope...

RE: Logistic Fit - Tim Wessman - 01-21-2014 04:46 PM

Well, this logistic fit was pulled from the old hp math library, but frankly it was never any good due to being so sensitive to even minor changes in numbers. When I was reimplementing it for the 39gII, I really wanted to switch it for a much better and more robust algorithm. However, after many many fruitless days of searching (over many months) I was never able to find a good fit that behaved predictably. The biggest challenge with this type of fit is finding good initial estimates. A human can easily identify what a reasonable estimate is, and whether it should be an increasing/decreasing version, but finding an algo that matched what is desired proved ridiculous difficult.

If anyone has any recommendations or suggestions, I am totally welcome to them.

RE: Logistic Fit - Han - 01-22-2014 04:02 AM

Is the current implementation merely a linear regression of something similar to \( \mathrm{logit}(P) = \alpha + \beta x \) where \( \mathrm{logit}(P) = \ln( \frac{P}{1-P}) \)? I was naively thinking about taking the min and max value of \( P \) and normalize it to between 0+0.0000001 and 1-0.0000001 using a linear function (so that there are no issues with \( \mathrm{logit}(P) \), doing a linear regression, and then taking the inverse of the normalizing function. I take it I'm forgetting something quite obvious...

Here's my naive approach in code (for data that is central around the origin).

Code:

// nl is the normalized list of y-values

// delta is merely "cutoff" so that all y-values are normalized to within

// the interval [delta, 1-delta] since we cannot use the interval [0,1]

// the returned string is a formula representing the logistic fit of the form

// L/(1+exp(-Ax-B)) + C; unless the data is bad, I think C is generally small

export logreg(xlist,ylist)

begin

  local ymin,ymax,n,nl;

  local delta:=.00000001,m;

  local logit;

  local lr;

  local f;

  ymin:=MIN(ylist);

  ymax:=MAX(ylist);

  m:=(1-2*delta)/(ymax-ymin);

  n:=SIZE(ylist);

  nl:=makelist( m*(ylist(X)-ymin)+delta,X,1,n );

  logit:=makelist( ln(nl(X)/(1-nl(X))),X,1,n);

  lr:=linear_regression(xlist,logit);

  // more "accurate" would be to use m*ymin-delta as opposed to m*ymin

  f:="" + 1/m + "/(1+e^(-(" + lr(1) + "*X+" + lr(2) + ")))+" + m*ymin;

  return(f);

end;

At the home screen:

Code:

L0:=makelist(X,X,-10,10,.1);

L1:=makelist(1/(1+e^(-X))+RANDOM()*.2,X,-10,10,.1);

logreg(L0,L1);

In the 2-vars Stats app, press [Num] and select C0 (and then C1, and C2) and press "Make"

C0: Expression: L0(X), X starts from 1 to 201 step 1
C1: Expression: L1(X), X starts from 1 to 201 step 1
C2: Expression: use formula given by logreg(L0,L1), X starts from -10 to 10 step .1

Hit [Plot] and ignore the error message. Change your plot settings accordingly. Here's a screenshot:

A smarter algorithm with check the \( R^2 \) value of the linear regression to see if outliers need to be filtered. Perhaps there may even be a preference for the points closer to the origin after normalization since \( \ln (\frac{P}{1-P}) \) grows large for \( P \) values close to 0 and 1. Or perhaps do two linear regressions (one favoring points near the origin) and compare the \( R^2 \) values, and choose the tighter fit.

Here's the linear regression of \( \ln (\frac{P}{1-P}) \) after \( P \) has be normalized in the example above.

Edit: this doesn't work for domains not centered about the origin.

RE: Logistic Fit - Tim Wessman - 01-22-2014 03:51 PM

I think that is roughly what it does based on my rather fuzzy memory.

RE: Logistic Fit - Helge Gabert - 01-22-2014 04:14 PM

This looks good to me, and it works!

Another possibility would be to let the user specify starting values, along with the fit, and use something like the Levenberg-Marquardt method, in analogy to the Moda library (V1.52) on hpcalc.org for the HP49/50.

RE: Logistic Fit - Han - 01-22-2014 04:38 PM

(01-22-2014 04:14 PM)Helge Gabert Wrote: This looks good to me, and it works!

Another possibility would be to let the user specify starting values, along with the fit, and use something like the Levenberg-Marquardt method, in analogy to the Moda library (V1.52) on hpcalc.org for the HP49/50.

I'm not sure if you were referring to my post (the one about a "naive" approach to logistic fitting), but if you were, do keep in mind that this technique is quite limited. For example, if the data is such that the domain is restricted to the interval \( [1,5] \) and whose range is in the interval \( [.5,2] \) then this technique fails. If we normalize the range \( [.5,2]\) to \( [P_{min}, P_{max} ] \), how does one determine whether \( P_{min} \) is closer to 0 or .1 or .5 or even .75 (similarly if \( P_{max} \) should be 1, or much smaller). The domain may be of some help. That is, if we're "far to the right" then \(P_{min}\) and \( P_{max} \) will presumably be each closer to 1. However, there are similar issues even when using the domain.

So when you say have the user specify the starting values, would that be essentially the same as allowing them to select the \( P_{min} \) and \( P_{max} \) values? I think that could work for data that is not central about the origin.

I'm a little rusty in logistics modeling, but vaguely remember something about maximum likelihood estimates have some connection here (?).

RE: Logistic Fit - Helge Gabert - 01-22-2014 06:16 PM

Yes, starting values should help to circumvent the "not central about the origin" issue. Coupled with a technique like Levenberg-Marquardt (alternating between Newton and Steepest Descent) works for most data sets - - although local minima might be encountered. I believe that is what is implemented in MODA, and also in MRQ library for the 49/50.