The Museum of HP Calculators

HP Forum Archive 14

 HP 10 Stat problemMessage #1 Posted by Mona on 11 Mar 2004, 1:05 p.m. I got HP 10 B and can't figure out how to compute standard deviation, once you have probabilities and rates of return: For example: Probabilities: .3, .4, and .3 and associated rate of return : 60%, 20% and -20% I have owner's manual, but I can't find instructions how to compute standard deviation for this particular example. Does anyone have the same model calculator and knows how to do this?? I appreciate your help. Mona Edited: 11 Mar 2004, 1:09 p.m.

 Re: HP 10 Stat problemMessage #2 Posted by Vieira, Luiz C. (Brazil) on 11 Mar 2004, 2:58 p.m.,in response to message #1 by Mona Hello, Mona; which HP10 you have? HP10B, 10BII, HP10C? I cannot remember other model... The original HP10 (printing adding machine) has no statistical resources, so I ghues it's one of the others. If you have an HP10C, the older of the thre ones, than you can find the standard deviation for the X-values with the sequence [f][S], and [S] is the orange (yellow, gold...) inscription over the [.] key. To see the standard deviation fo the Y-values, use [X<>Y] (X exchange Y, beside [CLx] key) If yours is an HP10B (brown case), then the standard deviation for the X-values is computed with the sequence [SHIFT][Sx,Sy] ([SHIFT] is the orange key, and [Sx,Sy] is over the [8] key). You need to use [SHIFT][SWAP] to see the standard deviation for the Y-values. If yours is a newer HP10BII, the procedure is the same as if you are using an HP10B, except for the location of the [SWAP] key. BTW, both HP10B and HP10BII offer both the standard deviation applied to the population AND to a sample. In the HP10C, only standard deviation applied to a sample. Hope this helps. Cheers. Luiz (Brazil)

 Re: HP 10 Stat problemMessage #3 Posted by hugh steers on 11 Mar 2004, 8:40 p.m.,in response to message #2 by Vieira, Luiz C. (Brazil) hi luiz, radically, i would like to suggest that the 10b, along with the vast majority of calculators, is incompetent at calculating standard deviations. try this: cl SUM, 1000000, SUM+, 999999, SUM+, 1000001, SUM+, Sx which yields zero. the answer is 1. for a long time, i thought that accurate computation would necessitate the storage of all values (this is how the 48 does it). but no! a simple algorithm known in the 70's was this: n=0, s=0, m=0. loop: get x. n + 1 -> n. x - m -> t. m + t/n -> m. s + t*(x - m) -> s. goto loop. the standard deviation is then sqrt(s/(n-1)). what is interesting is that this way is actually simpler than the textbook sums of squares and square sum formula. the incompetence of calculators at sd was pointed out to me by a colleague who works in a laboratory where experimental samples are always biased by a significant mean factor and the raw sample numbers never work correctly when input to calculators (nor excel, so it seems). best wishes,

 Re: HP 10 Stat problemMessage #4 Posted by Iuri Wickert on 12 Mar 2004, 12:26 a.m.,in response to message #3 by hugh steers Quote: cl SUM, 1000000, SUM+, 999999, SUM+, 1000001, SUM+, Sx Very interesting example! Thanks! My Casio FX-82MS shows this behavior, but my spectra ssc200 clone doesn't: Sx=1, Ox=0.8164... Quote: the incompetence of calculators at sd was pointed out to me by a colleague who works in a laboratory where experimental samples are always biased by a significant mean factor and the raw sample numbers never work correctly when input to calculators (nor excel, so it seems). Maybe your colleague should take a look at OpenOffice.org 1.1 ! Best regards, Iuri Wickert

 Good CallMessage #5 Posted by Namir on 12 Mar 2004, 9:51 a.m.,in response to message #3 by hugh steers Thanks Hugh for the valuable insight. Like many, I thought the sev stats offered by the calculators were always on the money!!

 Re: HP 10 Stat problemMessage #6 Posted by Namir on 12 Mar 2004, 2:15 p.m.,in response to message #3 by hugh steers Some statisticians advocate transforming the data by subtracting the average value from the oberservations. This improves accuracy in the calculations for for standard deviation as well as linear regression slopes and intercepts. Namir

 Re: HP 10 Stat problemMessage #7 Posted by Eric Smith on 12 Mar 2004, 2:30 p.m.,in response to message #3 by hugh steers It's been too many years since I took statistics class, but if I recall correctly, this depends on whether you're computing the standard deviation of a sample or of a population. For small values of n (three in your example) there is a substantial difference.

 Re: HP 10 Stat problemMessage #8 Posted by Norris on 12 Mar 2004, 7:36 p.m.,in response to message #7 by Eric Smith This issue is explicitly discussed in at least some HP calculator manuals. For example, it is addressed on page 11-11 of the 32SII manual, under "Normalizing Close, Large Numbers." The issue affects linear regression calculations, as well as the standard deviation. The problem is not due to population vs. sample standard deviation. It's actually due to roundoff error. Most calculators obtain standard deviation by summing the squares of the entered values. But if large values are entered, the square may have more significant digits than the calculator can keep track of. For example, 999,999 squared should be 999,998,000,001, but many calculators will round this off to 0.999998 E12, or 999,998,000,000. Another digit will get lost when 1,000,001 is squared. So the sum of x^2 will be incorrect, and since this sum is used to calculate the standard deviation, the s.d. will be wrong too. The 32SII manual shows how to work around this problem by normalizing the data. The HP48GX calculates s.d. differently; it sums the square of the difference between each entered value and the mean (as per p. 3-301 of the AURM). The 48GX solves the stated problem correctly without normalizing

 Re: HP 10 Stat problemMessage #9 Posted by Norris on 13 Mar 2004, 12:28 a.m.,in response to message #8 by Norris The HP11C manual (pp. 56-57) includes a similar discussion of possible problems when using statistical functions on large, close numbers. The HP20S is also subject to this problem, but the manual (which is relatively thin) does not acknowledge it

 Re: Rounding errors on calculators' statistical functionsMessage #11 Posted by Norris on 14 Mar 2004, 12:40 p.m.,in response to message #10 by James M. Prange Your procedure is, as you surmise, similar to the "normalizing" procedure outlined in the HP-11C and HP-32SII manuals (possibly other HP manuals as well). The HP-recommended procedure is to simply subtract the same "central value" (i.e., the mean, or an estimate of the mean) from each data value as it is entered into the calculator. The standard deviation obtained by the calculator should then be correct. Multiplying by powers of 10 to make all data integers is not necessary, but could make the subtraction easier if you are doing it in your head. Alternatively, the 11C or 32SII could be programmed to automatically do the subtraction and enter the normalized results.

 Thank you, Norris...Message #12 Posted by Karl Schneider on 14 Mar 2004, 10:45 p.m.,in response to message #11 by Norris ... for bringing much clarity and insight to this discussion.

 Re: Rounding errors on calculators' statistical functionsMessage #13 Posted by James M. Prange on 15 Mar 2004, 2:05 a.m.,in response to message #11 by Norris Quote:Multiplying by powers of 10 to make all data integers is not necessary, but could make the subtraction easier if you are doing it in your head. Yes, I agree that the multiplying/dividing by powers of 10 is entirely unnecessary to avoid rounding errors in any calculator/computer that uses a floating point representation of numbers. Well, unless there's a chance of overflow of the exponent of 10. The "central value" to be subtracted doesn't have to be one that's "easy" to subtract in your head, just one that reduces the number of significant digits enough to avoid rounding of Sum(xi2). To do it in a program, I'd simply use the first value entered to subtract from all values. In the RPL calculators that keep the entire data set in memory and don't keep a running total of Sum(xi2), the subtraction is entirely unnecessary (as long as long as there's no chance that rounding will occur at the Sum(xi) step). Why bother with these steps then? Well, yes, I suppose that multiplying by a power of 10 does make the subtraction step marginally easier to do in your head. More importantly, whether doing it on paper or using a calculator, it avoids the need to write down or key in the decimal point and/or leading/trailing zeros; a trivial consideration in an example problem of only a few numbers, but in a real-world problem of tens, or more likely hundreds, of data points, a very important consideration for me. When doing it on paper (my original use of "coded data"), those decimal points and zeros would need to be accounted for in subsequent steps, including, depending on which method you choose, the squares of |xi-xbar|, or the squares of xi. Using my method substantially reduces the amount of writing needed, and, in my opinion, also reduces the likelihood of errors. Quote:Alternatively, the 11C or 32SII could be programmed to automatically do the subtraction and enter the normalized results. Quite true, but then you have to key in all of the digits and decimal points (extra key-presses and more opportunities for typing errors). Perhaps best would be to write the program to do this automatically; I wonder why the developers didn't include this, saving unwary users from some incorrect results. Then, if the user chose to use my "coded data" or some similar method, it would still work perfectly and save a lot of key-presses. Regards,James Edited: 15 Mar 2004, 2:35 a.m.

 Thank you , HughMessage #14 Posted by Vieira, Luiz C. (Brazil) on 13 Mar 2004, 9:37 p.m.,in response to message #3 by hugh steers Hello, Hugh; I want to thank you for your excellent example AND analysis. I was not aware of this fact, and these brainy gems must be always taken as precious gifts. Thank you. I also remembered that the HP42S accepts [r × 2] matrices as input data for [SIGMA+], but it still uses summation data to compute standard deviation. I thought the HP48/49 series also had this "behavior", but you called my attention to this fact, now. Cheers and thank you again. Luiz (Brazil)

 Re: HP 10 Stat problemMessage #15 Posted by Tom Sherman on 22 Mar 2004, 4:26 p.m.,in response to message #3 by hugh steers I enjoyed the recent posts by Hugh Steers, Norris, James Prange, and others concerning the methods for doing standard deviations on calculators. The beautiful algorithm presented by Hugh (I will call it "Hugh's algorithm" if he will forgive me) had my mind boggled for several days -- hence my delay in this response. A calculator can, it seems, be programmed in three main ways (with many variations) to do standard deviations. We recall the definition of the standard deviation: that it is the square root of the variance, where the variance is the sum of the squares of displacement of data points from their mean, normalized by dividing by the number of points, n, if the mean is known before hand (population variance), or by n-1 if the mean has to be established from the data (sample variance). Let the sum of the squares of the displacements (deviations) be denoted by s, and consider the ways that s can be calculated. The first method directly sums the squares of the deviations from the mean to find s. After all, that is how s is defined, so why would we think of doing it any other way? The problem is that the mean has to be found first, by summing the data points and dividing by n, before the squares of deviations from the mean can be found. So this first method requires that the data be handled twice by the program -- that the program have two loops in series, through which the data pass. If the calculator has so much memory that it can assign a memory register to each data point, this method is fine, but if it has not, then the calculator would require that the operator feed it the data twice. No one would want to do that, and so no calculator that has only a few memory registers would be built using this method. The second method makes use of the fact that the original definition of s can be easily transformed by expansion of (x-m)^2, and substituting for m its equivalent of Sum(x)/n. The expression for s then becomes: Sum(x^2)-((Sum(x))^2)/n. The calculation of s can then be done with only one loop, with one pass of the data, as Sum(x^2) and Sum(x) can each be incremented as the new data points are introduced. A calculator using this method does not have to remember the original data. It only needs to have three storage registers, for Sum(x^2), Sum(x), and n. Hence this method is available to calculators having limited memory, and seems to have been adopted by most of them that have a standard deviation program. The problem with this second method, as Norris has lucidly explained, is that the squaring of raw data can overrun the number of digits available to the calculator, if the data have many digits. Small differences in data will therefore be lost by the calculator, and incorrect results will be returned. As Norris and James and the HP manuals explain, this problem can be avoided by subtracting an assumed mean from the data, a process which is called "normalizing" the data (a somewhat inappropriate term, since it involves subtraction rather than division). The best assumed mean would be the mean itself, but if we knew that, we would be using the first method rather than the second. It is fairly easy to see, as James has shown us, that the real mean of data that have been normalized can be recovered by adding back the assumed mean to the mean of the normalized data. At first blush of intuition, it is less obvious (at least to me) that an s calculated from normalized data does not have to be corrected in some way to give back the real s for the original data. But no correction is necessary. s is a measure only of scatter or dispersion of points along a number line, and so long as the scale of that line is not changed (by multiplication or division), it does not matter where the zero point of that line is taken. The process of normalizing can be pictured as one of sliding the number line under the data points until its zero is close to the mean of the points. The points remain in the same positions relative to one another, and their total scatter is not affected. A little algebra clinches it: the normalized data yield the correct value for s without any adjustment such as is needed for the mean. As James pointed out, it would be nice if a calculator using this second method had a program for normalizing the data, and James further suggested that the program could start by taking the first point as the assumed mean. That is exactly what the algorithm cited by Hugh does. Hugh's algorithm gives us a third method for calculating s, and one that strikes me as novel, beautiful, and at first, mind-boggling. I had to stare at it for a long time -- unraveling it in my mind, iterating it on paper, confirming it with a BASIC program, and finally going through all its algebra, before I fully understood it. Hugh's algorithm, as he well recognizes, combines the high accuracy of the first method with the second method's economy in storage registers. The data is fed through a single loop and the results are given without any need to retain the data. The only storage registers required are for n (the number of data points), m (the mean of the entered data), s (the same s as used above), and t. What is t? Maybe it stands for temporary, or transition, or tension. It is in fact the deviation of a new data point from the mean of the previously entered data. As it turns out, t is not essential. It can be eliminated altogether if desired, and a variation of the algorithm can be written which requires only three variables: n, m, and s. The third method then achieves the high accuracy of the first while using only three storage registers, the same number used by the less accurate second method. The novelty of the third method (or so it seems to me) is that it makes us view s in a somewhat different way. In the original definition of s, used by the first method, we are inclined to view s as a summation of independent contributions from the various data points. But now we are reminded that s results from all the inter-relations of the points, and that the s that a new point brings into a previously-existing set of points is not usually the same s that we would calculate for it after it has joined the set -- because the entrance of the new point usually changes the mean of the set. The s value contributed by an earlier point has, in effect, to be recalculated as a new point is added, because the mean has changed. The first data point entered, for example, initially contributes a zero value for s, because the point is its own mean. But once a second point is added, assuming it is different from the first, the mean shifts to a position halfway between the two points, and the increment of s brought in by the second point has to account not only for its own s for the now two-point system, but for the s of the first point as well. At each stage, as new points are brought in, the algorithm determines the mean and s for that number of points. At each step, the new point brings in an increment of s that correctly increases the s for the previous points to allow for the change in mean that the new point has caused. The mean of a group of points represents not only their average value (in the sense of Sum(x)/n, but the point at which the sum of squares of their deviations (their s value) is at a minimum. Hence a change in mean caused by a new point also causes an increase in s for the previous points. The strategy of the third method is to continually recalculate the mean as the new points are fed in, and then to use the new mean to calculate the increment in s. It is, in effect, a program to continually "re-normalize" the data as the new points are declared. At each step, the program has the true mean and s value for the declared set, even though it has "forgotten" what the previous points were. And at each step the multiplications are of relatively small numbers -- multiplications of normalized data rather than of the original data. In the variation of the algorithm that eliminates t and works only with n, m, and s, the values of m and s are incremented (as n increases) by the following relations: m(new) = ((n-1)*m(old) + x)/n s(new) = s(old) + (n*(x-m(new))^2)/(n-1) Ah, no -- I am not the first to work out these relations. Michael Zeltkevic (1995) has them at: http://web.mit.edu/10.001/Web/Course_Notes/Statistics_Notes/Visualization/node4.html A short BASIC program to run this algorithm, using only storage variables n, m, and s, could be something like this: 10 S=0 20 PRINT "ENTER 999.111 TO END DATA INPUT" 30 INPUT "X =";X 40 M=X 50 N=1 60 INPUT "X =";X 70 IF X=999.111 THEN 120 80 N=N+1 90 M=((N-1)*M+X)/N 100 S=S+(N*(X-M)^2)/(N-1) 110 GOTO 60 120 PRINT "MEAN =";M 130 PRINT "VARIANCE =";S/(N-1) 140 PRINT "STD.DEV =";SQR(S/(N-1)) 150 PRINT "COEF. OF VAR =";(SQR(S/(N-1)))/M 160 END (Since it has no t variable with which to work, this variation of Hugh's algorithm keeps the first x value outside the main loop in order to prevent a division by zero in line 100 -- hence the inelegance of the two input commands. Perhaps you can find a nicer way of doing it.) I want to thank again Hugh, Norris, James, and all the others for such an interesting and illuminating series of posts about the calculation of the standard deviation. I think Hugh was right that the makers of many calculators have failed to find the most accurate way to do the calculation when memory is limited. My apologies for being so long-winded in saying this. Cheers, Tom

 Your finance book should show how to do thisMessage #16 Posted by Gene on 11 Mar 2004, 8:56 p.m.,in response to message #1 by Mona This is a classic example in most principle of finance books. The HP10BII (and HP10B for that matter) do not solve standard deviations this way. They expect a list of numbers. You'll have to do this by hand using the formula in your book. Sorry! Gene

 Re: HP 10 Stat problemMessage #17 Posted by Tizedes Csaba on 12 Mar 2004, 6:01 p.m.,in response to message #1 by Mona Hello, if I must to solve this, I do this: ```On my 32SII: CLSum 60 Sum+ 60 Sum+ 60 Sum+ 20 Sum+ 20 Sum+ 20 Sum+ 20 Sum+ -20 Sum+ -20 Sum+ -20 Sum+ x_Aver Result is: 20.0000 s_x Result is: 32.6599 sigma_x Result is: 30.9839 ``` I hope I don't maked mistake... Csaba

 Re: HP 10B Stat problemMessage #18 Posted by Karl Schneider on 13 Mar 2004, 5:59 p.m.,in response to message #1 by Mona Mona -- I also have a 10B, so I should be able to help. I'm afraid that our group's discussions haven't really addressed your question. With rates of return and probabilities given, is would seem that what you want is a weighted sum -- ```(0.3 * 60) + (0.4 * 20) + (0.3 * -20) = 20.0 ``` This can be done by the following procedure ("E" = capital Sigma): ```CL E .3 INPUT 60 E+ .4 INPUT 20 E+ .3 INPUT 20 +/- E+ RCL 9 ``` (register 9 holds the E_xy summation.) Or, using the built-in function Xw (shifted "6" key): ```CL E 60 INPUT .3 E+ 20 INPUT .4 E+ 20 +/- INPUT .3 E+ Xw ``` To calculate sample standard deviation, do ```Sx,Sy (shifted "8" key) (read sample SD of probabilities) SWAP (read sample SD of percentage rates of return) ``` Calculation of population standard deviations is similar, using "0x,0y" (shifted "9" key) -- Karl S.

 It's not exactly correct...Message #19 Posted by Tizedes Csaba on 13 Mar 2004, 6:15 p.m.,in response to message #18 by Karl Schneider Hi Karl, the calculation of mean is correct, but this is ONE sample, not two. We want to calculate the SD of 60 with probability with 0.3, and so on... The calculator's summation don't know that! Csaba Ps.: I think I will write a little program for the correct solution...

 ...and neither was your statement...Message #20 Posted by Karl Schneider on 13 Mar 2004, 6:51 p.m.,in response to message #19 by Tizedes Csaba Tizedes -- I admit that I probably didn't understand the problem as Mona stated it, but certainly the weighted mean calculation seemed relevant, and I showed how to calculate standard deviations. Quote: the calculation of mean is correct, but this is ONE sample, not two. We want to calculate the SD of 60 with probability with 0.3, and so on... The calculator's summation don't know that! Well, I do! The population SD of any single sample (including "60") is zero, with absolute certainty. The sample SD of a single sample cannot be calculated. Or, did you mean that this was a one-variable calculation? There are many ways to interpret the problem as it was stated, and I question how well the author understood the problem. Maybe the standard deviation of the three expected values could be computed: ```CL E .3 * 60 = E+ .4 * 20 = E+ .3 * 20 +/- = E+ RCL 5 (gives weighted mean of 20.0) (mean, Sample SD, pop SD can then be calculated) ``` -- Karl

 To Karl (and Mona) - The correct wayMessage #21 Posted by Tizedes Csaba on 13 Mar 2004, 8:43 p.m.,in response to message #20 by Karl Schneider Hello, the correct solution is the follows: ```Given the following datas with their probabilities: -------------- i pi xi 1 0.3 60 2 0.4 20 3 0.3 -20 -------------- The average is: xAver = SUMMA(pi*xi) = 20.0000 The SD is: xSD = SQRT(SUMMA((xi-xAver)^2*pi)) = 30.9839 ``` Best wishes! Csaba Ps.: Dear Karl, I'm so sorry, but I'm not a genius in English, so I don't understand everything in your letter... I'm so sorry, again...! Ps2.: "Or, did you mean that this was a one-variable calculation?" This is an one variable discrete distribution.

 Tizedes is right...Message #24 Posted by Karl Schneider on 13 Mar 2004, 11:36 p.m.,in response to message #21 by Tizedes Csaba Tizedes -- Quote: Ps.: Dear Karl, I'm so sorry, but I'm not a genius in English, so I don't understand everything in your letter... I'm so sorry, again...! No need for contrition -- I admit to being a bit "snarky", but when things aren't stated correctly, I'm not bashful about pointing it out. I do acknowledge that you are not a native speaker of English. All along, I beleived that the problem was a bit more complicated that a garden-variety SD calculation, but I didn't take the trouble to try to figure it out what exactly was to be calculated. I think you provided the formula for what Mona actually wanted. Your equations/answers of: ```The average is: xAver = SUMMA(pi*xi) = 20.0000 The SD is: xSD = SQRT(SUMMA((xi-xAver)^2*pi)) = 30.9839 ``` are correct assuming that the sum of the weighting factors is unity (1.00), as they ought to be (and indeed were in this example). Otherwise, each summation must be divided by the sum of the weighting factors SUMMA("pi"). I found a similar problem on p. 110 of Schaum's Outline for Statistics (c. 1961). -- Karl