HP Forums

I have been stymied trying to get the Prime to perform a Logistic fit in the Statistics 2Var app. Whenever I attempt to choose "Logistic" in the Symb view and then press Plot, I get Error: Invalid Input. Other function choices do fit the data (albiet badly). Anyone else seen this behavior?

Please post the data you are trying to fit. That fit will misbehave very badly if you don't have data that is quite well behaved.

Tim,

The data was generated by adding random noise to a sigmoid function:

c1:= -10:0.1:10;
c2:= sigmoid(c1)+random()*0.2;

where:

sigmoid(x):= 1/(1+exp(-x))

Are you aware that your sigmoid function has a pole x=0? So it doesn't seem to make much sense to fit from -10 to 10.

Even so, I tried to fit from 3 to 10 in 0.1 steps with a random noise added, and that also won't work with the logistic fit (with fixed L, A, B), so it seems to me that this problem could be investigated by HP.

(01-19-2014 09:48 PM)Helge Gabert Wrote: [ -> ]Are you aware that your sigmoid function has a pole x=0? So it doesn't seem to make much sense to fit from -10 to 10.

{o,o}
|)__)
-”-”-
O RLY?

Oops! I inadvertently entered 1/(1-e^(-x)),, which is discontinuous at x=0! Of course, 1/(1+e^(-x)) is not, and behaves as shown. Thanks for pointing that out.

You will find, however, that it still doesn't fit.

I already found that out Smile

Although, for certain subsets of the x range, no problem, e.g. 1 .. 5 with 0.1 step.

Interesting...I hadn't noticed that.

Still, it seems that if it could fit any data reliably this would be the easiest set. Hopefully Tim can add it to the list of things to investigate for a future release.

I suggest you prefix your thread title with "[BUG]"
Would be good to know if there will be a firmware update and what is in scope...

Well, this logistic fit was pulled from the old hp math library, but frankly it was never any good due to being so sensitive to even minor changes in numbers. When I was reimplementing it for the 39gII, I really wanted to switch it for a much better and more robust algorithm. However, after many many fruitless days of searching (over many months) I was never able to find a good fit that behaved predictably. The biggest challenge with this type of fit is finding good initial estimates. A human can easily identify what a reasonable estimate is, and whether it should be an increasing/decreasing version, but finding an algo that matched what is desired proved ridiculous difficult.

If anyone has any recommendations or suggestions, I am totally welcome to them.

Is the current implementation merely a linear regression of something similar to \( \mathrm{logit}(P) = \alpha + \beta x \) where \( \mathrm{logit}(P) = \ln( \frac{P}{1-P}) \)? I was naively thinking about taking the min and max value of \( P \) and normalize it to between 0+0.0000001 and 1-0.0000001 using a linear function (so that there are no issues with \( \mathrm{logit}(P) \), doing a linear regression, and then taking the inverse of the normalizing function. I take it I'm forgetting something quite obvious...

Here's my naive approach in code (for data that is central around the origin).

Code:

// nl is the normalized list of y-values

// delta is merely "cutoff" so that all y-values are normalized to within

// the interval [delta, 1-delta] since we cannot use the interval [0,1]

// the returned string is a formula representing the logistic fit of the form

// L/(1+exp(-Ax-B)) + C; unless the data is bad, I think C is generally small

export logreg(xlist,ylist)

begin

  local ymin,ymax,n,nl;

  local delta:=.00000001,m;

  local logit;

  local lr;

  local f;

  ymin:=MIN(ylist);

  ymax:=MAX(ylist);

  m:=(1-2*delta)/(ymax-ymin);

  n:=SIZE(ylist);

  nl:=makelist( m*(ylist(X)-ymin)+delta,X,1,n );

  logit:=makelist( ln(nl(X)/(1-nl(X))),X,1,n);

  lr:=linear_regression(xlist,logit);

  // more "accurate" would be to use m*ymin-delta as opposed to m*ymin

  f:="" + 1/m + "/(1+e^(-(" + lr(1) + "*X+" + lr(2) + ")))+" + m*ymin;

  return(f);

end;

At the home screen:

Code:

L0:=makelist(X,X,-10,10,.1);

L1:=makelist(1/(1+e^(-X))+RANDOM()*.2,X,-10,10,.1);

logreg(L0,L1);

In the 2-vars Stats app, press [Num] and select C0 (and then C1, and C2) and press "Make"

C0: Expression: L0(X), X starts from 1 to 201 step 1
C1: Expression: L1(X), X starts from 1 to 201 step 1
C2: Expression: use formula given by logreg(L0,L1), X starts from -10 to 10 step .1

Hit [Plot] and ignore the error message. Change your plot settings accordingly. Here's a screenshot:

A smarter algorithm with check the \( R^2 \) value of the linear regression to see if outliers need to be filtered. Perhaps there may even be a preference for the points closer to the origin after normalization since \( \ln (\frac{P}{1-P}) \) grows large for \( P \) values close to 0 and 1. Or perhaps do two linear regressions (one favoring points near the origin) and compare the \( R^2 \) values, and choose the tighter fit.

Here's the linear regression of \( \ln (\frac{P}{1-P}) \) after \( P \) has be normalized in the example above.

Edit: this doesn't work for domains not centered about the origin.

I think that is roughly what it does based on my rather fuzzy memory.

This looks good to me, and it works!

Another possibility would be to let the user specify starting values, along with the fit, and use something like the Levenberg-Marquardt method, in analogy to the Moda library (V1.52) on hpcalc.org for the HP49/50.

(01-22-2014 04:14 PM)Helge Gabert Wrote: [ -> ]This looks good to me, and it works!

Another possibility would be to let the user specify starting values, along with the fit, and use something like the Levenberg-Marquardt method, in analogy to the Moda library (V1.52) on hpcalc.org for the HP49/50.

I'm not sure if you were referring to my post (the one about a "naive" approach to logistic fitting), but if you were, do keep in mind that this technique is quite limited. For example, if the data is such that the domain is restricted to the interval \( [1,5] \) and whose range is in the interval \( [.5,2] \) then this technique fails. If we normalize the range \( [.5,2]\) to \( [P_{min}, P_{max} ] \), how does one determine whether \( P_{min} \) is closer to 0 or .1 or .5 or even .75 (similarly if \( P_{max} \) should be 1, or much smaller). The domain may be of some help. That is, if we're "far to the right" then \(P_{min}\) and \( P_{max} \) will presumably be each closer to 1. However, there are similar issues even when using the domain.

So when you say have the user specify the starting values, would that be essentially the same as allowing them to select the \( P_{min} \) and \( P_{max} \) values? I think that could work for data that is not central about the origin.

I'm a little rusty in logistics modeling, but vaguely remember something about maximum likelihood estimates have some connection here (?).

Yes, starting values should help to circumvent the "not central about the origin" issue. Coupled with a technique like Levenberg-Marquardt (alternating between Newton and Steepest Descent) works for most data sets - - although local minima might be encountered. I believe that is what is implemented in MODA, and also in MRQ library for the 49/50.

jgreenb2

Tim Wessman

jgreenb2

Helge Gabert

Thomas Klemm

Helge Gabert

jgreenb2

Helge Gabert

jgreenb2

Tugdual

Tim Wessman

Han

Tim Wessman

Helge Gabert

Han

Helge Gabert