The Museum of HP Calculators

HP Forum Archive 20

[ Return to Index | Top of Index ]

Algorithm for fitting a logistic curve?
Message #1 Posted by Tim Wessman on 11 Nov 2011, 12:05 p.m.

Hello,

Working on the math library here, and I have had an immensely difficult time finding how to efficiently implement a logistic curve fit. Note, this isn't a full fledged binary logistic regression (which I can find lots of information on), but rather the fitting of a curve to a set of data with the form L/(1+a*^(-b*x)).

The fitting method in the math library right now linearizes the equation and it doesn't give a very good fit at all so I am trying to replace it.

Does anyone have any helpful pointers to any algorithms for this type of problem?

I have posted the Charlie Patton's code comments below from the 48 math library for those interested (he wrote this originally). I am not 100% certain if the issue is the linearization and a completely different method is needed, or just the L esitmator routine needs replacing/improvement.

* Name: fitlogist * Algorithm: The basic model is: * y=L/(1+A*e^(B*x)) * which is equivalent to: * Ae^(B*x)=(L-y)/y * so that * ln(A)+B*x=ln(L/y-1). * * Fit a linear model to the transformed data * ( x(i), ln(L/y(i)-1)) to obtain y=a+bx * then A=e^a and B=b * * Note B normally would be negative *

**Name: Lestimate ** **Category: Logistic Fit Utility ** **Entry: ** ** Stack: [XY] (sorted) ** ** ** Temp. Env. ** ** **Exit: ** ** Stack: L% (or %0 if there's a problem with zero divisors) ** ** ** Temp. Env. ** **Errors: ** ** **Description/Algorithm: ** ** This utility attempts to estimate the saturation value for a logistic ** equation from sorted statistical samples. ** ** It is assuming that the data Y(X) (stored in pair form X[i],Y[i]) corresponds ** to samples from a differential equation dY(X)/dX = Y(X)*k*(L-Y(X)) ** ** Note that this is an autonomous ODE with nodes at Y=0 and Y=L. ** If we plot (1/Y)*(dY(X)/dX) as a function of Y (note that it doesn't really depend ** on X) we will get a straight line with a zero at Y=L. It is this fact we will use ** to approximate L from the data. Namely, replacing dY(X)/dX by it sampled version ** dY(X)/dX ~ Y'[i]=(Y[i+1]-Y[i])/(X[i+1]-X[i]) ** we do linear regression on the pair Y[i],Z[i] with Z[i]=Y'[i]/Y[i] and ** find the zero of the corresponding line. ** **Author: C.M.Patton **Date Written: April 10, 1995

TW

--

Although I work for the HP calculator department, the comments and opinions I express here are my own.

Edited: 11 Nov 2011, 12:18 p.m.

      
Re: Algorithm for fitting a logistic curve?
Message #2 Posted by Eric Smith on 11 Nov 2011, 1:47 p.m.,
in response to message #1 by Tim Wessman

In my experience, fitting either the logistic function or the tanh function tends to get poor results. I suspect that this is due to how rapidly they go asymptotic, but that's really only a guess on my part. Hopefully someone knowledgeable about numerical analysis can explain how to do it properly.

      
Re: Algorithm for fitting a logistic curve?
Message #3 Posted by Dieter on 11 Nov 2011, 4:15 p.m.,
in response to message #1 by Tim Wessman

Quote:
The fitting method in the math library right now linearizes the equation and it doesn't give a very good fit at all so I am trying to replace it
Linearizing the equation, followed by a simple linear regression, is a classic method that usually gives decent results. However, it does not minimize the sum of the residuals' squares. How did you determine the quality of the fit here?
Quote:
I have posted the Charlie Patton's code comments below from the 48 math library for those interested (he wrote this originally). I am not 100% certain if the issue is the linearization and a completely different method is needed, or just the L esitmator routine needs replacing/improvement.
As far as I can see the comments simply refer to the common linearization, which here leads to the transformation ln(A)+B*x = ln(L/y-1). The goal of the algorithm however seems to be a different one: implement a method to estimate the saturation parameter L: "This utility attempts to estimate the saturation value for a logistic equation from sorted statistical samples".

A true least-square regression, i.e. one that exactly minimizes the sum of the residuals' squares, is not trivial. I came across the following document and think it's an interesting read on this subject: http://home2.fvcc.edu/~dhicketh/DiffEqns/Activities/logistic.pdf

Dieter

      
Re: Algorithm for fitting a logistic curve?
Message #4 Posted by MacDonald Phillips on 11 Nov 2011, 5:42 p.m.,
in response to message #1 by Tim Wessman

Tim, Unfortunately, if you linearize an equation to fit it to your data, you do not minimize the SSE. And, not all equations can be linearized. What is needed is a non-linear fitting routine. But this requires a calculator with a CAS system so you can compute the derivatives of the equation with respect to the parameters. I have done this for the TI-89 and the NSpire CX CAS. If you want, I can send the routines to you. My email is don.phillips@gmail.com.

Don

            
Re: Algorithm for fitting a logistic curve?
Message #5 Posted by Crawl on 11 Nov 2011, 7:27 p.m.,
in response to message #4 by MacDonald Phillips

I can't believe I'm saying this (being a big fan of CAS calculators), but you don't NEED to use a CAS. I use Excel's Solver routine all the time to do least squares fitting to arbitrary function forms.

      
Re: Algorithm for fitting a logistic curve?
Message #6 Posted by Wes Loewer on 13 Nov 2011, 1:22 a.m.,
in response to message #1 by Tim Wessman

Tim,

Quote:
Does anyone have any helpful pointers to any algorithms for this type of problem?

How critical is speed?

Perhaps you've already been down this road, but using a brute-force approach I took the equivalent equation:

y = L/(1+a*exp(-k*x))

and applied least-mean-square principles:

Let E = sum i=1 to n of (L/(1+A*EXP(-K*X_i)) - Y_i)^2

then minimized E by taking the partial derivatives of E with respect to L, a, and k and setting them to zero.

E 'L' DERIV 0 =
E 'A' DERIV 0 =
E 'K' DERIV 0 =

This gives three non-linear equations which can then be solved numerically for L, a, k.

I tried this with a few sample data points on the 50g (using the SOLVESYS lib to solve) and in Maxima (using MNEWTON to solve) and got matching results which also matched the FitLogistic command in the computer software GeoGebra. I don't know if you're allowed to use GPL code for your project, but GeoGebra is a GPL program with source code available from http://www.geogebra.org/source/program/. Perhaps you could look and see how they handle it.

You don't need the CAS since the derivatives can be hard coded, but numerically solving the equations is of course the bottleneck. It might even be faster to use the linearized results as the initial values in the iterative solving process.

~wes


[ Return to Index | Top of Index ]

Go back to the main exhibit hall