HP Forums

Full Version: On the skeptical combination of experimental results
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I wanted to share with the community a method for combining experimental results which is described in chapter 11 (pp 252-259) of the book "Bayesian Reasoning in Data Analysis" by Giulio D'Agostini.

The function SKEPTIC is here:

// The variables 'd' and 's' should both
// be lists of the same size. The list
// 'd' refers to measurements and the
// list 's' refers to errors in the
// measurements.
// 'delta' and 'lambda' are priors for combining
// measurements.

  LOCAL delta, lambda;



Put the experimental values you wish to combine in a list, say L1 and the reported errors in another list, say L2. Then, in the symbol view of the Function app, set F1(X) = SKEPTIC(X,L1,L2).

You then switch to the plot view to visualize the probability distribution for the combined value. Likely you will have to use zooming to optimize the plot. The trace cursor can be used to find the most probable value (peak) as well as some notion of the curve's "width" (say half width at half max).

The idea is this: if we "trust" the values and error bars reported by the various experimenters, the usual recipe for computing a weighted average and overall error bar is fine. But if the error bars are inflated or deflated (intentional or unintentional), the Bayesian approach allows for some "slop" in the error bars and finds optimal agreement between the experiments. For instance, if one experiment reports a value with small error bars that is quite different from other experiments, the usual weighted average calculation will be dominated by that "rogue" experiment. This does not happen with the Bayesian (SKEPTIC) recipe.

By way of example, suppose we have the following measurements for acceleration of gravity from a series of experiments (the last experiment is "wrong" but has small error bar):


A weighted average gives 9.704 +/- 0.004 m/s^2.

The SKEPTIC function gives:


Using the trace tool, the combination is 9.80 +/- 0.01 m/s^2. The width (0.01) is not as small as for the weighted average - the recipe has decided that the experiments vary more than their errors, and so their errors are probably inflated. We end up with a more robust estimate of the mean and realistic overall error for the combination. The distribution has a bit of a tail on left side because the 9.600 +/- 0.005 experiment is considered, but the remaining experiments determine the peak.

So this is a good example of when the usual weighted average can mislead somewhat. In practice, a practitioner would either discard the "outlier" or inflate its error bar before combining, but the advantage to the Bayesian approach is none of the data needs to be discarded. After all, it could be that the outlier is correct and better experiments might confirm it.
Reference URL's