The Museum of HP Calculators

HP Forum Archive 14

[ Return to Index | Top of Index ]

HP 10 Stat problem
Message #1 Posted by Mona on 11 Mar 2004, 1:05 p.m.

I got HP 10 B and can't figure out how to compute standard deviation, once you have probabilities and rates of return: For example:

Probabilities: .3, .4, and .3 and associated rate of return : 60%, 20% and -20%

I have owner's manual, but I can't find instructions how to compute standard deviation for this particular example. Does anyone have the same model calculator and knows how to do this?? I appreciate your help.

Mona

Edited: 11 Mar 2004, 1:09 p.m.

      
Re: HP 10 Stat problem
Message #2 Posted by Vieira, Luiz C. (Brazil) on 11 Mar 2004, 2:58 p.m.,
in response to message #1 by Mona

Hello, Mona;

which HP10 you have? HP10B, 10BII, HP10C? I cannot remember other model...

The original HP10 (printing adding machine) has no statistical resources, so I ghues it's one of the others.

If you have an HP10C, the older of the thre ones, than you can find the standard deviation for the X-values with the sequence [f][S], and [S] is the orange (yellow, gold...) inscription over the [.] key. To see the standard deviation fo the Y-values, use [X<>Y] (X exchange Y, beside [CLx] key)

If yours is an HP10B (brown case), then the standard deviation for the X-values is computed with the sequence [SHIFT][Sx,Sy] ([SHIFT] is the orange key, and [Sx,Sy] is over the [8] key). You need to use [SHIFT][SWAP] to see the standard deviation for the Y-values.

If yours is a newer HP10BII, the procedure is the same as if you are using an HP10B, except for the location of the [SWAP] key.

BTW, both HP10B and HP10BII offer both the standard deviation applied to the population AND to a sample. In the HP10C, only standard deviation applied to a sample.

Hope this helps.

Cheers.

Luiz (Brazil)

            
Re: HP 10 Stat problem
Message #3 Posted by hugh steers on 11 Mar 2004, 8:40 p.m.,
in response to message #2 by Vieira, Luiz C. (Brazil)

hi luiz,

radically, i would like to suggest that the 10b, along with the vast majority of calculators, is incompetent at calculating standard deviations. try this:

cl SUM, 1000000, SUM+, 999999, SUM+, 1000001, SUM+, Sx

which yields zero. the answer is 1. for a long time, i thought that accurate computation would necessitate the storage of all values (this is how the 48 does it). but no! a simple algorithm known in the 70's was this:

n=0, s=0, m=0. loop: get x. n + 1 -> n. x - m -> t. m + t/n -> m. s + t*(x - m) -> s. goto loop.

the standard deviation is then sqrt(s/(n-1)).

what is interesting is that this way is actually simpler than the textbook sums of squares and square sum formula. the incompetence of calculators at sd was pointed out to me by a colleague who works in a laboratory where experimental samples are always biased by a significant mean factor and the raw sample numbers never work correctly when input to calculators (nor excel, so it seems).

best wishes,

                  
Re: HP 10 Stat problem
Message #4 Posted by Iuri Wickert on 12 Mar 2004, 12:26 a.m.,
in response to message #3 by hugh steers

Quote:
cl SUM, 1000000, SUM+, 999999, SUM+, 1000001, SUM+, Sx

Very interesting example! Thanks! My Casio FX-82MS shows this behavior, but my spectra ssc200 clone doesn't: Sx=1, Ox=0.8164...

Quote:
the incompetence of calculators at sd was pointed out to me by a colleague who works in a laboratory where experimental samples are always biased by a significant mean factor and the raw sample numbers never work correctly when input to calculators (nor excel, so it seems).

Maybe your colleague should take a look at OpenOffice.org 1.1 !

Best regards,
Iuri Wickert

                  
Good Call
Message #5 Posted by Namir on 12 Mar 2004, 9:51 a.m.,
in response to message #3 by hugh steers

Thanks Hugh for the valuable insight. Like many, I thought the sev stats offered by the calculators were always on the money!!

                  
Re: HP 10 Stat problem
Message #6 Posted by Namir on 12 Mar 2004, 2:15 p.m.,
in response to message #3 by hugh steers

Some statisticians advocate transforming the data by subtracting the average value from the oberservations. This improves accuracy in the calculations for for standard deviation as well as linear regression slopes and intercepts.

Namir

                  
Re: HP 10 Stat problem
Message #7 Posted by Eric Smith on 12 Mar 2004, 2:30 p.m.,
in response to message #3 by hugh steers

It's been too many years since I took statistics class, but if I recall correctly, this depends on whether you're computing the standard deviation of a sample or of a population. For small values of n (three in your example) there is a substantial difference.

                        
Re: HP 10 Stat problem
Message #8 Posted by Norris on 12 Mar 2004, 7:36 p.m.,
in response to message #7 by Eric Smith

This issue is explicitly discussed in at least some HP calculator manuals. For example, it is addressed on page 11-11 of the 32SII manual, under "Normalizing Close, Large Numbers." The issue affects linear regression calculations, as well as the standard deviation.

The problem is not due to population vs. sample standard deviation. It's actually due to roundoff error. Most calculators obtain standard deviation by summing the squares of the entered values. But if large values are entered, the square may have more significant digits than the calculator can keep track of.

For example, 999,999 squared should be 999,998,000,001, but many calculators will round this off to 0.999998 E12, or 999,998,000,000. Another digit will get lost when 1,000,001 is squared. So the sum of x^2 will be incorrect, and since this sum is used to calculate the standard deviation, the s.d. will be wrong too.

The 32SII manual shows how to work around this problem by normalizing the data.

The HP48GX calculates s.d. differently; it sums the square of the difference between each entered value and the mean (as per p. 3-301 of the AURM). The 48GX solves the stated problem correctly without normalizing

                              
Re: HP 10 Stat problem
Message #9 Posted by Norris on 13 Mar 2004, 12:28 a.m.,
in response to message #8 by Norris

The HP11C manual (pp. 56-57) includes a similar discussion of possible problems when using statistical functions on large, close numbers.

The HP20S is also subject to this problem, but the manual (which is relatively thin) does not acknowledge it

                              
Rounding errors on calculators' statistical functions
Message #10 Posted by James M. Prange on 14 Mar 2004, 2:55 a.m.,
in response to message #8 by Norris

At work, the actual deviations from the mean are usually small compared to the mean. Back when we worked out the mean and standard deviation on paper (well, ok, using a calculator for basic arithmetic), I came up with a way to simplify the process. Ok, I've seen it in books later, so I wasn't the first. I think of it as "coded data", and I expect that it's essentially the same as "normalizing".

First, if the data isn't all integers not ending with zero, multiply it (mentally) by some power of 10 (let's call this a), that is, move the decimal points, to make all of the data integers, at least some of which don't end with zero. Now choose some number (let's call this b) that's easy to subtract from these values and results in relatively small integers (preferably mostly positive, but some negatives won't hurt). Do the subtraction (again, in your head) and write down the results in a new column. I think of this as "encoding". Now work out the mean, variance, and standard deviation of the coded data just as you normally would. For the mean of the original data, add the mean of the coded data to b, and then divide by a. For the variance, divide the coded variance by a2; that is, move the decimal twice as many places in the opposite direction. For the standard deviation, just divide the coded standard deviation by a. It's still tedious, but much better than working with 5- or 6-digit values.

The first time that I saw a calculator with statistical functions, it was a "must-have". I tried it out on working out the standard deviation of a small (50-piece) capability study, keying in all 6 digits of each value. Wrong answer! Did I make a mistake keying it in? I tried again, but got the same wrong answer. I was almost certain that I'd gotten it right working it out on paper. But what was the Sum(x2) key on the calculator for? So I decided to read the fine manual a little closer. It turned out that the calculator kept track of n, Sum(x), and Sum(x2), and used them in a method that was mathematically correct, except that with so many digits, the Sum(x2) was being rounded off. Ok, that makes sense; it would take considerable memory to keep track of all of the data, and keeping a running total of just 3 values allows it to do the statistics with much less memory. Just be careful of the rounding off problem.

Trying again on the calculator, but using the "coded data" method to avoid the rounding off, the calculator returned the correct answers, and there was a lot less key-pushing involved.

The RPL calculators (28, 48, and 49 series) don't use the method with Sum(x2), so the rounding off problem isn't nearly so bad, although I suppose that they must still do some rounding off (in many cases). But for my purposes, they're easily "close enough"; after all, we can expect some measurement error in the original data anyway. I suppose that if the original data had, say, 12 significant digits, it would be a real problem, but I very much doubt that I'll ever see that at work.

Since the RPL calculators keep all of the data in memory, it's also easy to review it for errors; something that the simpler calculators don't allow, short of keying it all in again.

Even with the RPL calculators, I still use my "coded data" method (but keying it into the calculator instead of writing it down), mostly because I don't have to key in so many digits.

I suppose that for maximum accuracy, the b value would be as close to the true mean as can be represented on the calculator, but with my method, I usually only have to key in 1 or 2 digits, and occasionally 3, for each value, and for practical purposes, the result is always correct.

And be careful of spreadsheets; I've seen some real nonsense when people tried to use them instead of specialized statistical applications on a computer. I'm not sure how much of the problem is operator error and how much is the algorithm the spreadsheets use and rounding off error. At least some spreadsheets will give you the population standard deviation instead of the sample standard deviation; something to watch out for.

Regards,
James

Edited: 14 Mar 2004, 4:51 a.m.

                                    
Re: Rounding errors on calculators' statistical functions
Message #11 Posted by Norris on 14 Mar 2004, 12:40 p.m.,
in response to message #10 by James M. Prange

Your procedure is, as you surmise, similar to the "normalizing" procedure outlined in the HP-11C and HP-32SII manuals (possibly other HP manuals as well).

The HP-recommended procedure is to simply subtract the same "central value" (i.e., the mean, or an estimate of the mean) from each data value as it is entered into the calculator. The standard deviation obtained by the calculator should then be correct.

Multiplying by powers of 10 to make all data integers is not necessary, but could make the subtraction easier if you are doing it in your head. Alternatively, the 11C or 32SII could be programmed to automatically do the subtraction and enter the normalized results.

                                          
Thank you, Norris...
Message #12 Posted by Karl Schneider on 14 Mar 2004, 10:45 p.m.,
in response to message #11 by Norris

... for bringing much clarity and insight to this discussion.

                                          
Re: Rounding errors on calculators' statistical functions
Message #13 Posted by James M. Prange on 15 Mar 2004, 2:05 a.m.,
in response to message #11 by Norris

Quote:
Multiplying by powers of 10 to make all data integers is not necessary, but could make the subtraction easier if you are doing it in your head.

Yes, I agree that the multiplying/dividing by powers of 10 is entirely unnecessary to avoid rounding errors in any calculator/computer that uses a floating point representation of numbers. Well, unless there's a chance of overflow of the exponent of 10.

The "central value" to be subtracted doesn't have to be one that's "easy" to subtract in your head, just one that reduces the number of significant digits enough to avoid rounding of Sum(xi2). To do it in a program, I'd simply use the first value entered to subtract from all values.

In the RPL calculators that keep the entire data set in memory and don't keep a running total of Sum(xi2), the subtraction is entirely unnecessary (as long as long as there's no chance that rounding will occur at the Sum(xi) step).

Why bother with these steps then? Well, yes, I suppose that multiplying by a power of 10 does make the subtraction step marginally easier to do in your head. More importantly, whether doing it on paper or using a calculator, it avoids the need to write down or key in the decimal point and/or leading/trailing zeros; a trivial consideration in an example problem of only a few numbers, but in a real-world problem of tens, or more likely hundreds, of data points, a very important consideration for me.

When doing it on paper (my original use of "coded data"), those decimal points and zeros would need to be accounted for in subsequent steps, including, depending on which method you choose, the squares of |xi-xbar|, or the squares of xi. Using my method substantially reduces the amount of writing needed, and, in my opinion, also reduces the likelihood of errors.

Quote:
Alternatively, the 11C or 32SII could be programmed to automatically do the subtraction and enter the normalized results.

Quite true, but then you have to key in all of the digits and decimal points (extra key-presses and more opportunities for typing errors). Perhaps best would be to write the program to do this automatically; I wonder why the developers didn't include this, saving unwary users from some incorrect results. Then, if the user chose to use my "coded data" or some similar method, it would still work perfectly and save a lot of key-presses.

Regards,
James

Edited: 15 Mar 2004, 2:35 a.m.

                  
Thank you , Hugh
Message #14 Posted by Vieira, Luiz C. (Brazil) on 13 Mar 2004, 9:37 p.m.,
in response to message #3 by hugh steers

Hello, Hugh;

I want to thank you for your excellent example AND analysis. I was not aware of this fact, and these brainy gems must be always taken as precious gifts. Thank you.

I also remembered that the HP42S accepts [r 2] matrices as input data for [SIGMA+], but it still uses summation data to compute standard deviation. I thought the HP48/49 series also had this "behavior", but you called my attention to this fact, now.

Cheers and thank you again.

Luiz (Brazil)

                  
Re: HP 10 Stat problem
Message #15 Posted by Tom Sherman on 22 Mar 2004, 4:26 p.m.,
in response to message #3 by hugh steers

I enjoyed the recent posts by Hugh Steers, Norris, James Prange, and others concerning the methods for doing standard deviations on calculators. The beautiful algorithm presented by Hugh (I will call it "Hugh's algorithm" if he will forgive me) had my mind boggled for several days -- hence my delay in this response.

A calculator can, it seems, be programmed in three main ways (with many variations) to do standard deviations. We recall the definition of the standard deviation: that it is the square root of the variance, where the variance is the sum of the squares of displacement of data points from their mean, normalized by dividing by the number of points, n, if the mean is known before hand (population variance), or by n-1 if the mean has to be established from the data (sample variance). Let the sum of the squares of the displacements (deviations) be denoted by s, and consider the ways that s can be calculated.

The first method directly sums the squares of the deviations from the mean to find s. After all, that is how s is defined, so why would we think of doing it any other way? The problem is that the mean has to be found first, by summing the data points and dividing by n, before the squares of deviations from the mean can be found. So this first method requires that the data be handled twice by the program -- that the program have two loops in series, through which the data pass. If the calculator has so much memory that it can assign a memory register to each data point, this method is fine, but if it has not, then the calculator would require that the operator feed it the data twice. No one would want to do that, and so no calculator that has only a few memory registers would be built using this method.

The second method makes use of the fact that the original definition of s can be easily transformed by expansion of (x-m)^2, and substituting for m its equivalent of Sum(x)/n. The expression for s then becomes: Sum(x^2)-((Sum(x))^2)/n. The calculation of s can then be done with only one loop, with one pass of the data, as Sum(x^2) and Sum(x) can each be incremented as the new data points are introduced. A calculator using this method does not have to remember the original data. It only needs to have three storage registers, for Sum(x^2), Sum(x), and n. Hence this method is available to calculators having limited memory, and seems to have been adopted by most of them that have a standard deviation program.

The problem with this second method, as Norris has lucidly explained, is that the squaring of raw data can overrun the number of digits available to the calculator, if the data have many digits. Small differences in data will therefore be lost by the calculator, and incorrect results will be returned. As Norris and James and the HP manuals explain, this problem can be avoided by subtracting an assumed mean from the data, a process which is called "normalizing" the data (a somewhat inappropriate term, since it involves subtraction rather than division). The best assumed mean would be the mean itself, but if we knew that, we would be using the first method rather than the second.

It is fairly easy to see, as James has shown us, that the real mean of data that have been normalized can be recovered by adding back the assumed mean to the mean of the normalized data. At first blush of intuition, it is less obvious (at least to me) that an s calculated from normalized data does not have to be corrected in some way to give back the real s for the original data. But no correction is necessary. s is a measure only of scatter or dispersion of points along a number line, and so long as the scale of that line is not changed (by multiplication or division), it does not matter where the zero point of that line is taken. The process of normalizing can be pictured as one of sliding the number line under the data points until its zero is close to the mean of the points. The points remain in the same positions relative to one another, and their total scatter is not affected. A little algebra clinches it: the normalized data yield the correct value for s without any adjustment such as is needed for the mean.

As James pointed out, it would be nice if a calculator using this second method had a program for normalizing the data, and James further suggested that the program could start by taking the first point as the assumed mean. That is exactly what the algorithm cited by Hugh does. Hugh's algorithm gives us a third method for calculating s, and one that strikes me as novel, beautiful, and at first, mind-boggling. I had to stare at it for a long time -- unraveling it in my mind, iterating it on paper, confirming it with a BASIC program, and finally going through all its algebra, before I fully understood it.

Hugh's algorithm, as he well recognizes, combines the high accuracy of the first method with the second method's economy in storage registers. The data is fed through a single loop and the results are given without any need to retain the data. The only storage registers required are for n (the number of data points), m (the mean of the entered data), s (the same s as used above), and t. What is t? Maybe it stands for temporary, or transition, or tension. It is in fact the deviation of a new data point from the mean of the previously entered data. As it turns out, t is not essential. It can be eliminated altogether if desired, and a variation of the algorithm can be written which requires only three variables: n, m, and s. The third method then achieves the high accuracy of the first while using only three storage registers, the same number used by the less accurate second method.

The novelty of the third method (or so it seems to me) is that it makes us view s in a somewhat different way. In the original definition of s, used by the first method, we are inclined to view s as a summation of independent contributions from the various data points. But now we are reminded that s results from all the inter-relations of the points, and that the s that a new point brings into a previously-existing set of points is not usually the same s that we would calculate for it after it has joined the set -- because the entrance of the new point usually changes the mean of the set. The s value contributed by an earlier point has, in effect, to be recalculated as a new point is added, because the mean has changed. The first data point entered, for example, initially contributes a zero value for s, because the point is its own mean. But once a second point is added, assuming it is different from the first, the mean shifts to a position halfway between the two points, and the increment of s brought in by the second point has to account not only for its own s for the now two-point system, but for the s of the first point as well. At each stage, as new points are brought in, the algorithm determines the mean and s for that number of points. At each step, the new point brings in an increment of s that correctly increases the s for the previous points to allow for the change in mean that the new point has caused. The mean of a group of points represents not only their average value (in the sense of Sum(x)/n, but the point at which the sum of squares of their deviations (their s value) is at a minimum. Hence a change in mean caused by a new point also causes an increase in s for the previous points.

The strategy of the third method is to continually recalculate the mean as the new points are fed in, and then to use the new mean to calculate the increment in s. It is, in effect, a program to continually "re-normalize" the data as the new points are declared. At each step, the program has the true mean and s value for the declared set, even though it has "forgotten" what the previous points were. And at each step the multiplications are of relatively small numbers -- multiplications of normalized data rather than of the original data.

In the variation of the algorithm that eliminates t and works only with n, m, and s, the values of m and s are incremented (as n increases) by the following relations:

m(new) = ((n-1)*m(old) + x)/n

s(new) = s(old) + (n*(x-m(new))^2)/(n-1)

Ah, no -- I am not the first to work out these relations. Michael Zeltkevic (1995) has them at:

http://web.mit.edu/10.001/Web/Course_Notes/Statistics_Notes/Visualization/node4.html

A short BASIC program to run this algorithm, using only storage variables n, m, and s, could be something like this:

10 S=0

20 PRINT "ENTER 999.111 TO END DATA INPUT"

30 INPUT "X =";X

40 M=X

50 N=1

60 INPUT "X =";X

70 IF X=999.111 THEN 120

80 N=N+1

90 M=((N-1)*M+X)/N

100 S=S+(N*(X-M)^2)/(N-1)

110 GOTO 60

120 PRINT "MEAN =";M

130 PRINT "VARIANCE =";S/(N-1)

140 PRINT "STD.DEV =";SQR(S/(N-1))

150 PRINT "COEF. OF VAR =";(SQR(S/(N-1)))/M

160 END

(Since it has no t variable with which to work, this variation of Hugh's algorithm keeps the first x value outside the main loop in order to prevent a division by zero in line 100 -- hence the inelegance of the two input commands. Perhaps you can find a nicer way of doing it.)

I want to thank again Hugh, Norris, James, and all the others for such an interesting and illuminating series of posts about the calculation of the standard deviation. I think Hugh was right that the makers of many calculators have failed to find the most accurate way to do the calculation when memory is limited. My apologies for being so long-winded in saying this.

Cheers, Tom

      
Your finance book should show how to do this
Message #16 Posted by Gene on 11 Mar 2004, 8:56 p.m.,
in response to message #1 by Mona

This is a classic example in most principle of finance books.

The HP10BII (and HP10B for that matter) do not solve standard deviations this way. They expect a list of numbers.

You'll have to do this by hand using the formula in your book.

Sorry! Gene

      
Re: HP 10 Stat problem
Message #17 Posted by Tizedes Csaba on 12 Mar 2004, 6:01 p.m.,
in response to message #1 by Mona

Hello,

if I must to solve this, I do this:

On my 32SII:

CLSum

60 Sum+ 60 Sum+ 60 Sum+ 20 Sum+ 20 Sum+ 20 Sum+ 20 Sum+ -20 Sum+ -20 Sum+ -20 Sum+

x_Aver Result is: 20.0000 s_x Result is: 32.6599 sigma_x Result is: 30.9839

I hope I don't maked mistake...

Csaba

      
Re: HP 10B Stat problem
Message #18 Posted by Karl Schneider on 13 Mar 2004, 5:59 p.m.,
in response to message #1 by Mona

Mona --

I also have a 10B, so I should be able to help. I'm afraid that our group's discussions haven't really addressed your question.

With rates of return and probabilities given, is would seem that what you want is a weighted sum --

(0.3 * 60) + (0.4 * 20) + (0.3 * -20) = 20.0

This can be done by the following procedure ("E" = capital Sigma):

CL E
.3 INPUT 60 E+
.4 INPUT 20 E+
.3 INPUT 20 +/-  E+
RCL 9  

(register 9 holds the E_xy summation.)

Or, using the built-in function Xw (shifted "6" key):

CL E
60 INPUT .3  E+
20 INPUT .4  E+
20 +/- INPUT .3 E+
Xw

To calculate sample standard deviation, do

Sx,Sy (shifted "8" key)
(read sample SD of probabilities)
SWAP
(read sample SD of percentage rates of return)

Calculation of population standard deviations is similar, using "0x,0y" (shifted "9" key)

-- Karl S.

            
It's not exactly correct...
Message #19 Posted by Tizedes Csaba on 13 Mar 2004, 6:15 p.m.,
in response to message #18 by Karl Schneider

Hi Karl,

the calculation of mean is correct, but this is ONE sample, not two. We want to calculate the SD of 60 with probability with 0.3, and so on...

The calculator's summation don't know that!

Csaba

Ps.: I think I will write a little program for the correct solution...

                  
...and neither was your statement...
Message #20 Posted by Karl Schneider on 13 Mar 2004, 6:51 p.m.,
in response to message #19 by Tizedes Csaba

Tizedes --

I admit that I probably didn't understand the problem as Mona stated it, but certainly the weighted mean calculation seemed relevant, and I showed how to calculate standard deviations.

Quote:
the calculation of mean is correct, but this is ONE sample, not two. We want to calculate the SD of 60 with probability with 0.3, and so on...

The calculator's summation don't know that!


Well, I do!

  • The population SD of any single sample (including "60") is zero, with absolute certainty.
  • The sample SD of a single sample cannot be calculated.

Or, did you mean that this was a one-variable calculation?

There are many ways to interpret the problem as it was stated, and I question how well the author understood the problem.

Maybe the standard deviation of the three expected values could be computed:

CL E
.3 * 60 = E+
.4 * 20 = E+
.3 * 20 +/- = E+
RCL 5   (gives weighted mean of 20.0)
(mean, Sample SD, pop SD can then be calculated)

-- Karl

                        
To Karl (and Mona) - The correct way
Message #21 Posted by Tizedes Csaba on 13 Mar 2004, 8:43 p.m.,
in response to message #20 by Karl Schneider

Hello,

the correct solution is the follows:

Given the following datas with their probabilities:

-------------- i pi xi

1 0.3 60 2 0.4 20 3 0.3 -20 --------------

The average is: xAver = SUMMA(pi*xi) = 20.0000 The SD is: xSD = SQRT(SUMMA((xi-xAver)^2*pi)) = 30.9839

Best wishes!

Csaba

Ps.: Dear Karl, I'm so sorry, but I'm not a genius in English, so I don't understand everything in your letter... I'm so sorry, again...!

Ps2.: "Or, did you mean that this was a one-variable calculation?"

This is an one variable discrete distribution.

                              
What about... (was:To Karl (and Mona) - The correct way )
Message #22 Posted by Vieira, Luiz C. (Brazil) on 13 Mar 2004, 9:53 p.m.,
in response to message #21 by Tizedes Csaba

Hi Tizedes, Karl, Mona (who's probably aware of what's going on here...:)

First of all, Tizedes, I've been reading your posts since some of the first ones (the calculator design) and I must confess I was not aware of the fact you were a brilliant young guy based on what you wrote since. Only when you mention your age I could realize that. Maybe it's not too late to congratulate you for your achievements (and prize) and to mention that I admire such guys like you and young minds like yours, so... Best regards and keep your way going! Success!

Karl, you called my attention to the fact that I (along with some others) did not answer Mona; and I remember that I opened the post, read some manuals, took some extra info and answered it, but later, when I read Mona's post again, I saw that the text was changed while I was answering (even the "B" reference was added, and it was not there), so I think that I answered what I thought that could answer the original question... It seems some extra data was added and then I saw that my answer was way out of the actual need! Thank you!

The following text is just a reasoning, and may have no numeric "foundation"... Sorry if you read and find reasoning gaps!

About Mona's issue: when I saw the percentages, I thought the problem should use weighted mean (average) to compute final values, but Mona explicitly asks for Standard Deviation. Even so, I don't see a way to apply -20(%) as a valid data for weighted mean.

Now: what about using .6, .2 and -.2 instead of 60(%), 20(%) and -20(%)? I know pi-related values won't change, but xi and some summation indexes will vary (be 100 times shorter). Wouldn't it express related values closer to whatever is under observation? (and what does -20% refer to in a sample?)

Just some thoughts...

Luiz (Brazil)

Edited: 13 Mar 2004, 10:21 p.m.

                                    
Re: What about...
Message #23 Posted by Tizedes Csaba on 14 Mar 2004, 4:36 a.m.,
in response to message #22 by Vieira, Luiz C. (Brazil)

Dear Luiz,

thank you very much for your words! This was made please to me! I don't want to make misunderstanding with my poor english knowledge, so I don't write more now.

Thank you, Luiz!

Csaba

                              
Tizedes is right...
Message #24 Posted by Karl Schneider on 13 Mar 2004, 11:36 p.m.,
in response to message #21 by Tizedes Csaba

Tizedes --

Quote:
Ps.: Dear Karl, I'm so sorry, but I'm not a genius in English, so I don't understand everything in your letter... I'm so sorry, again...!

No need for contrition -- I admit to being a bit "snarky", but when things aren't stated correctly, I'm not bashful about pointing it out. I do acknowledge that you are not a native speaker of English.

All along, I beleived that the problem was a bit more complicated that a garden-variety SD calculation, but I didn't take the trouble to try to figure it out what exactly was to be calculated. I think you provided the formula for what Mona actually wanted.

Your equations/answers of:

The average is: xAver = SUMMA(pi*xi)                 = 20.0000
The SD is:      xSD   = SQRT(SUMMA((xi-xAver)^2*pi)) = 30.9839

are correct assuming that the sum of the weighting factors is unity (1.00), as they ought to be (and indeed were in this example). Otherwise, each summation must be divided by the sum of the weighting factors SUMMA("pi").

I found a similar problem on p. 110 of Schaum's Outline for Statistics (c. 1961).

-- Karl


[ Return to Index | Top of Index ]

Go back to the main exhibit hall