Threaded Mode | Linear Mode

mark4flies · 01-26-2015, 01:42 PM

I think that the values for the sample and population standard deviation estimates are switched when using the Stats 1Var app. For example, I open the app and store data 88, 90, 89, 65, 70, and 89 in D1. I press "Stats" and see that sX (sample standard deviation) is 11.2323936 and sigmaX (population standard deviation) is 10.2537256. This answer makes no sense because the bias correction always produces an estimate for the population that is larger than for the sample.

If I use stddev(D1) the estimate is 10.2537256102. If I use stddevp(D1) the estimate is 11.2323936303. I verified these values by hand calculation. So it isn't that the HP Prime calculation is wrong, but that the report in the app is wrong.

Also, if I use Vars > App > Statistics 1Var > Results > sX or sigmaX, I get the wrong value returned. The values for these two estimates are switched in the app.

mbeddo · 01-26-2015, 03:46 PM

I confirmed this is so on mine.

2014.12.3 (6975) A

Tim Wessman · (This post was last modified: 01-26-2015 03:59 PM by Tim Wessman.)

I think the app values and calculations are correct and the CAS calculation is the swapped one here - unless I am remembering my stats wrong and all other units I've compared with are wrong too. Anyone better at stats disagree?

mbeddo · 01-26-2015, 04:12 PM

(01-26-2015 03:58 PM)Tim Wessman Wrote: I think the app values and calculations are correct and the CAS calculation is the swapped one here - unless I am remembering my stats wrong and all other units I've compared with are wrong too. Anyone better at stats disagree?

Tim, you are correct. I should have known better - I teach statistics, after all. When you ask for the population standard deviation, you assume that you have captured the entire population (in D1, for instance) and therefore know the mean precisely. When you ask for the sample standard deviation, you acknowledge that you only have a sample of the population (in D1, for instance) and therefore do not know the population mean (but you estimate it), so the standard deviation has to be a bit larger than its population value.

Apologies for stirring this up.

Tim Wessman · 01-26-2015, 04:29 PM

(01-26-2015 04:12 PM)mbeddo Wrote: Apologies for stirring this up.

Well I think a real problem was found here (the two stat functions in the CAS being swapped). Unless I am reading the xcas documentation wrong I think the commands are doing exactly the opposite of what they should be.

John P · 01-26-2015, 05:40 PM

(01-26-2015 01:42 PM)mark4flies Wrote: I think that the values for the sample and population standard deviation estimates are switched when using the Stats 1Var app. For example, I open the app and store data 88, 90, 89, 65, 70, and 89 in D1. I press "Stats" and see that sX (sample standard deviation) is 11.2323936 and sigmaX (population standard deviation) is 10.2537256. This answer makes no sense because the bias correction always produces an estimate for the population that is larger than for the sample.

If I use stddev(D1) the estimate is 10.2537256102. If I use stddevp(D1) the estimate is 11.2323936303. I verified these values by hand calculation. So it isn't that the HP Prime calculation is wrong, but that the report in the app is wrong.

Also, if I use Vars > App > Statistics 1Var > Results > sX or sigmaX, I get the wrong value returned. The values for these two estimates are switched in the app.

Hello,

Sample std. dev. has in denominator (n-1) but population std. dev has only (n) so after division sample std. dev. will have greater valu, because smaller number in denominator, than population std. dev.

Cheers

mark4flies · 01-26-2015, 06:16 PM

I misunderstood the purpose of these two functions. I thought that the one with n for the divisor was the (biased) sample estimate and the one with n-1 was the (corrected, unbiased) population estimate. (I never saw any documentation stating their interpretation.) Now I understand that the population standard deviation is not an estimate and does not require any correction (simply divide by n).

So, the results in the app are correct and it is actually the case that the functions stddev() and stddevp() are switched?

JimS · 01-26-2015, 10:51 PM

Hi,

The STAT APP, as already mentioned, is correct.
The XCAS documentation for stddev() says that it calculates the std dev if the argument supplied IS the population.
stddevp() is the std dev if the argument supplied is a SAMPLE.

The nomenclature is confusing when looking at the HELP for these commands on the calculator (the heading block).

For me, I would prefer stddev() be used for a sample and stddevp used for calculating the population std dev.

Jim

parisse · 01-27-2015, 06:50 AM

I disagree, here is why.
The natural way to introduce standard deviation is take the mean of the square difference to the mean, then sqrt. It's only after that you establish that if you have a sample, the unbiaised estimate of the standard deviation is different, i.e. you must divide by n-1 instead of n : the proof is not difficult but not trivial, I'm not sure all the maths students I have could do it. Moreover, unless the sample is really small, the difference is small and it will not change much inference statistics results like confidence intervals (for example if your sample has more than n=30, where normal approximations becomes reasonable, the difference is less than 2%). In other words, standard deviation of the population deduced from a sample is a refinement, we could perfectly live with the biaised estimated.
Therefore it's more natural to have the shortest commandname, that is stddev, when dividing by n, and add a p when dividing by n-1 (p means population deduced from sample). Perhaps removing stddevp and renaming it stddevs would be less confusing.

John P · 01-27-2015, 03:23 PM

(01-27-2015 06:50 AM)parisse Wrote: I disagree, here is why.
The natural way to introduce standard deviation is take the mean of the square difference to the mean, then sqrt. It's only after that you establish that if you have a sample, the unbiaised estimate of the standard deviation is different, i.e. you must divide by n-1 instead of n : the proof is not difficult but not trivial, I'm not sure all the maths students I have could do it. Moreover, unless the sample is really small, the difference is small and it will not change much inference statistics results like confidence intervals (for example if your sample has more than n=30, where normal approximations becomes reasonable, the difference is less than 2%). In other words, standard deviation of the population deduced from a sample is a refinement, we could perfectly live with the biaised estimated.
Therefore it's more natural to have the shortest commandname, that is stddev, when dividing by n, and add a p when dividing by n-1 (p means population deduced from sample). Perhaps removing stddevp and renaming it stddevs would be less confusing.

Hello,

Parisse wrote:
"Therefore it's more natural to have the shortest commandname, that is stddev, when dividing by n, and add a p when dividing by n-1 (p means population deduced from sample). Perhaps removing stddevp and renaming it stddevs would be less confusing."

That is weird. IMHO command name, among other things, should also be informative. Why not to have stddevs, for sample and stddevp for populatoin. Simple and not confusing.

Cheers