Threaded Mode | Linear Mode

Namir · 05-31-2018, 12:48 PM

The root-seeking Bisection method is the simplest, slowest, but very reliable algorithm for finding the root of a function. The method is know for choosing a refined guess value in the middle of the root-bracketing interval (A, B). The refined root is calculated using:

C = (A + B) / 2

The value of C replaces either A or B, based of matching function signs.

I wrote a paper (click here) that looks at the above equation in more general terms:

C = (w1 * A + w2 * B) / (w1 + w2)

The paper explores using different combinations of w1 and w2 and how most of these combinations can reduce the number of iterations needed to reach a refined guess for the root at a specific tolerance value.

Some people just won't leave "well-enough" alone :-)

Namir

Claudio L. · 05-31-2018, 07:25 PM

(05-31-2018 12:48 PM)Namir Wrote: C = (w1 * A + w2 * B) / (w1 + w2)

The paper explores using different combinations of w1 and w2 and how most of these combinations can reduce the number of iterations needed to reach a refined guess for the root at a specific tolerance value.

Why did you choose w1+w2 as the parameter instead of w1/w2?

The equation transforms easily:

w2* ( w1/w2 * A + B ) / ( w2 * (1+w1/w2)) = ((w1/w2)*A+B) / (1+w1/w2)

Now this last expression reflects what you are doing much better: you fix the weight on one point to 1 (B in this case) and choose some ratio to determine the other weight (in your paper, you chose p=1+w1/w2 as the parameter and w1/w2= p-1). I think the w1/w2 ratio could give a better sense of calibration: when set to 1 it's the traditional bisection, and the point is at the center of A and B, when set to zero, it moves the "chosen" new point all the way to B, and any values in between will move it from the center to B progressively. Values >1 would move it in the opposite way, with +Inf moving it all the way to A.
The results would depend on the function you are studying, the "ideal" value, assuming the function is close to a straight line would be to use the magnitude of the functions at A and B as the parameters, effectively turning bisection into linear interpolation.

A final observation: it took me a while (not so clear on the paper) to visualize why it would be an improvement at all. Turns out your paper is apparently using a "fixed" ratio but it's not, since you are applying the weight of 1 to the point where the function has the smallest absolute value, so you are shifting the chosen point closer to that point, every time, effectively using a pair of ratios, not only one, and choosing the best of the two each time. The improvement to the method comes more from that decision than from the actual ratio you use.

ttw · 05-31-2018, 08:35 PM

There several demonstrations of the bisection method's optimality (over a suitable set of functions.)

https://cs.stackexchange.com/questions/7...ion-method

Two intuitive observations also indicate this: first, if the function f(x) and -f(x) are equally likely, then going either way is equivalent. Second, when dividing an interval into two parts, the longer part is "more likely" (in some sense) to contain the root or any other part of interest.

If you know something about the function, it's possible to do better. It's not obvious if there is a learning procedure that works often enough to be useful. The "no free lunch" theorem points out that methods good for one particular subset of functions will always have a set that they fail badly on.

Namir · 05-31-2018, 11:40 PM

(05-31-2018 07:25 PM)Claudio L. Wrote:
(05-31-2018 12:48 PM)Namir Wrote: C = (w1 * A + w2 * B) / (w1 + w2)

The paper explores using different combinations of w1 and w2 and how most of these combinations can reduce the number of iterations needed to reach a refined guess for the root at a specific tolerance value.

Why did you choose w1+w2 as the parameter instead of w1/w2?

The equation transforms easily:

w2* ( w1/w2 * A + B ) / ( w2 * (1+w1/w2)) = ((w1/w2)*A+B) / (1+w1/w2)

Now this last expression reflects what you are doing much better: you fix the weight on one point to 1 (B in this case) and choose some ratio to determine the other weight (in your paper, you chose p=1+w1/w2 as the parameter and w1/w2= p-1). I think the w1/w2 ratio could give a better sense of calibration: when set to 1 it's the traditional bisection, and the point is at the center of A and B, when set to zero, it moves the "chosen" new point all the way to B, and any values in between will move it from the center to B progressively. Values >1 would move it in the opposite way, with +Inf moving it all the way to A.
The results would depend on the function you are studying, the "ideal" value, assuming the function is close to a straight line would be to use the magnitude of the functions at A and B as the parameters, effectively turning bisection into linear interpolation.

A final observation: it took me a while (not so clear on the paper) to visualize why it would be an improvement at all. Turns out your paper is apparently using a "fixed" ratio but it's not, since you are applying the weight of 1 to the point where the function has the smallest absolute value, so you are shifting the chosen point closer to that point, every time, effectively using a pair of ratios, not only one, and choosing the best of the two each time. The improvement to the method comes more from that decision than from the actual ratio you use.

The paper shows different cases for w1+w2 (using manual selection):

1) Sum is under 2. I just chose to assign 1 to one weight and calculate teh other weight as sum - 1. I could have used different numbers.
2) Sum is equal to 2, but w1 <> 1 and w2 <> 1.
3) Sum > 2 and w1 and w2 are selected to enhance the advantageous root-bracketing end.

Another approach is to use the absolute function values to calculate w1 and w2. So the paper shows two approaches.

You have total freedom to select w1 and w2 as long as it obeys:

C = (w1 * A + w2 * B) / (w1 + w2)

The relative values of w1 and w2 that you select will influence convergence rate, which of course greatly depends on the f(x) and the tolerance value for the refined guess for the root.

Namir

ttw · (This post was last modified: 06-01-2018 09:10 AM by ttw.)

An interesting Bayesian-Frequentist method.

https://www.informs-sim.org/wsc11papers/359.pdf

Some more stuff using statistics to improve bisection and others.

http://www.mat.ufmg.br/intranet-atual/pg...iss220.pdf

Namir · (This post was last modified: 06-01-2018 07:11 PM by Namir.)

(06-01-2018 08:57 AM)ttw Wrote: An interesting Bayesian-Frequentist method.

https://www.informs-sim.org/wsc11papers/359.pdf

Some more stuff using statistics to improve bisection and others.

http://www.mat.ufmg.br/intranet-atual/pg...iss220.pdf

Thanks for the links!!!

:-)

Namir

Claudio L. · 06-01-2018, 06:38 PM

(05-31-2018 11:40 PM)Namir Wrote: The paper shows different cases for w1+w2 (using manual selection):

1) Sum is under 2. I just chose to assign 1 to one weight and calculate teh other weight as sum - 1. I could have used different numbers.
2) Sum is equal to 2, but w1 <> 1 and w2 <> 1.
3) Sum > 2 and w1 and w2 are selected to enhance the advantageous root-bracketing end.

Not trying to be a pain, just observing that those 3 cases are one and the same, simply by multiplying numerator and denominator by the same value, you'll always arrive to:

((w1/w2)*A+B) / (1+w1/w2)

because there is one single independent parameter. For example, case 2) above is the same as case 1) if you multiply numerator and denominator of a generic case 1) by (2/Sum). You get a w1' and w2' such that their sum is exactly 2 (and it represents the exact same function). Just multiplying by a larger value would change it into case 3). They are all numerically equivalent, one and the same.
The magnitude of w1+w2 doesn't really define the problem as much as the relationship between w1 and w2, hence my suggestion to use that as a parameter.

(05-31-2018 11:40 PM)Namir Wrote: You have total freedom to select w1 and w2 as long as it obeys:

C = (w1 * A + w2 * B) / (w1 + w2)

The relative values of w1 and w2 that you select will influence convergence rate, which of course greatly depends on the f(x) and the tolerance value for the refined guess for the root.

Namir

I agree with your last sentence, it's the relative values of w1 and w2 that matter. I disagree with the sentence above the formula, as you don't have total freedom to choose w1 and w2. If you choose a second set w1' and w2' such that:

w1' = k * w1
w2' = k * w2

They will produce the exact same function C for all k. You can simply replace it in the formula to see how the k gets canceled out.

I think you have a good point in trying to improve it by getting it closer to doing a linear interpolation, but without adding the extra computing time required by a true interpolation (is like doing a "guessed" interpolation without actually interpolating anything). I just thought the choice of parameter was a little confusing to me.
Other than that, it's a great paper, an eye opener.

Namir · 06-03-2018, 12:15 PM

(06-01-2018 06:38 PM)Claudio L. Wrote:
(05-31-2018 11:40 PM)Namir Wrote: The paper shows different cases for w1+w2 (using manual selection):

1) Sum is under 2. I just chose to assign 1 to one weight and calculate teh other weight as sum - 1. I could have used different numbers.
2) Sum is equal to 2, but w1 <> 1 and w2 <> 1.
3) Sum > 2 and w1 and w2 are selected to enhance the advantageous root-bracketing end.

Not trying to be a pain, just observing that those 3 cases are one and the same, simply by multiplying numerator and denominator by the same value, you'll always arrive to:

((w1/w2)*A+B) / (1+w1/w2)

because there is one single independent parameter. For example, case 2) above is the same as case 1) if you multiply numerator and denominator of a generic case 1) by (2/Sum). You get a w1' and w2' such that their sum is exactly 2 (and it represents the exact same function). Just multiplying by a larger value would change it into case 3). They are all numerically equivalent, one and the same.
The magnitude of w1+w2 doesn't really define the problem as much as the relationship between w1 and w2, hence my suggestion to use that as a parameter.

(05-31-2018 11:40 PM)Namir Wrote: You have total freedom to select w1 and w2 as long as it obeys:

C = (w1 * A + w2 * B) / (w1 + w2)

The relative values of w1 and w2 that you select will influence convergence rate, which of course greatly depends on the f(x) and the tolerance value for the refined guess for the root.

Namir

I agree with your last sentence, it's the relative values of w1 and w2 that matter. I disagree with the sentence above the formula, as you don't have total freedom to choose w1 and w2. If you choose a second set w1' and w2' such that:

w1' = k * w1
w2' = k * w2

They will produce the exact same function C for all k. You can simply replace it in the formula to see how the k gets canceled out.

I think you have a good point in trying to improve it by getting it closer to doing a linear interpolation, but without adding the extra computing time required by a true interpolation (is like doing a "guessed" interpolation without actually interpolating anything). I just thought the choice of parameter was a little confusing to me.
Other than that, it's a great paper, an eye opener.

Yes, your approach works very well too.

Namir

ttw · 06-05-2018, 12:56 AM

Even weirder stuff. These are mostly for many dimensional problems.

https://arxiv.org/pdf/0902.4562.pdf

https://www2.isye.gatech.edu/~brani/isyestat/04-13.pdf

Namir · 06-05-2018, 05:30 AM

Thanks ttw. Twenty years ago I was able to design a version of the Bisection algorithms to solve for the roots of a system of two nonlinear equation (F(x,y)=0 and G(x,y)=0. The difficulty in any such algorithm is to select the initial root-bracketing intervals (Ax,Ay) and (Bx, By) where the functions F and G have opposite signs. You can find that algorithm on my web site.

Perhaps I need to design another algorithm that helps find the proper interval for the above version of the Bisection method.

Namir