'New' Statistics algorithm Message #29 Posted by Eamonn on 28 July 2004, 4:12 a.m., in response to message #28 by Norris
Hi Norris,
I had a closer look at how this 'new' algorithm can be modified to calculate the mean of x, the mean of y, the slope and y-intercept of the best-fit line, and the correlation coefficient. In particular I wanted to see what are the register requirements for a calculator based implementation that can calculate these statistics.
Some observations:
- The t register used in the new algorithm is a temporary register that does not need to be maintained from one iteration to the next. It can be re-used for both the x and y statistics calculations.
- The m register contains the latest estimate of the mean of the data set.
- The n register contains the total number of data elements so far.
- The s register contains (std_dev)^2 * (n-1)
When using the new algorithm to calculate the mean and standard variation for two variable statistics, it is necessary to store n, mx, sx, my and sy (5 registers). It is also necessary to have a scratchpad register, t, for the temporary calculations, but this can be re-used for both the x-data set and the y-data set.
Traditionally, calculators with two-variable statistics store the following 6 values, which are then used to calculate the mean, standard variation, slot, intercept, etc.
- n
- Sigma_x
- Sigma_x^2
- Sigma_y
- Sigma_y^2
- Sigma_xy
n, the number of elements, is the same in both algorithms.
Sigma_x, Sigma_x^2, Sigma_y and Sigma_y^2 can be calculated from n, mx, my, sy and sy as follows:
Sigma_x = n * mx
Sigma_y = n + my
Sigma_x^2 = sx + n * mx^2
Sigma_y^2 = sy + n * my^2
It is not possible to calculate Sigma_xy from n, mx, my, sy and sy, so a calculator using the new algorithm would need to store this sum, just as done in a calculator using the traditional algorithm. This requires the use of one extra register.
Since we can calculate Sigma_x, Sigma_y, etc. from the new algorithm, we can also calculate the slope and y-intercept of the best-fit line, and the correlation coefficient. So, in order to calculate the aforementioned statistics, the new algorithm requires a total of six storage registers, the same as in the traditional algorithm, plus a temporary storage register.
It appears that there are more calculations required for the new algorithm, versus the traditional algorithm. If ROM space or calculator speed is an issue then this could give the nod to the traditional algorithm. However, the number of storage registers is the same for both algorithms (assuming that the temporary register can be re-used for other things also) and the new algorithm gives better estimates of the standard variation (although not always as good as that obtained by the two pass method). It appears that this 'new' algorithm could certainly have been used on all, but perhaps the very earliest calculators that had ROM and speed limitations.
Regards,
Eamonn.
|