Post Reply 
stdevp( ) appears to be mislabeled
05-27-2016, 11:50 PM
Post: #1
stdevp( ) appears to be mislabeled
It appears that the stdevp( ) function is mislabeled.

Instead of being the population standard deviation, it is the sample standard deviation. The sample standard deviation has to be greater than the population standard deviaion by sqrt(N/(N-1)).

Here is a check: approx(sddev({1,2,3})) = 0.816496580928

stdevp({1,2,3}) = 1

stdevp({1,2,3})*sqrt(2/3) = 0.816496580928, which is the population standard deviation.

The quickest fix might be to relabel stdevp( ) as stdevs( ) and change help to reflect that this is the sample standard deviation rather than the population standard deviation.
Find all posts by this user
Quote this message in a reply
05-28-2016, 01:00 AM
Post: #2
RE: stdevp( ) appears to be mislabeled
I made some typos in my post (vision issues; sorry).

The relevant functions are stddev( ) and stddevp( ).
Find all posts by this user
Quote this message in a reply
05-28-2016, 02:11 AM
Post: #3
RE: stdevp( ) appears to be mislabeled
That is the case.

The Stats 1 Var Numeric View "Stats" displays the sample and population standard deviation correctly; however, calls to stddev and stddevp are mixed up.
Find all posts by this user
Quote this message in a reply
05-28-2016, 04:44 AM
Post: #4
RE: stdevp( ) appears to be mislabeled
If I remember correctly, Bernard has it that way on purpose and does not think it should change. I'm not sure the reasoning myself.

TW

Although I work for HP, the views and opinions I post here are my own.
Find all posts by this user
Quote this message in a reply
05-28-2016, 07:11 AM
Post: #5
RE: stdevp( ) appears to be mislabeled
stddevp = standard deviation of the population based on a sample
stddev = standard deviation
If this is too confusing, I have nothing against renaming stddevp to stddevs.
Find all posts by this user
Quote this message in a reply
05-28-2016, 07:12 AM (This post was last modified: 05-28-2016 07:14 AM by salvomic.)
Post: #6
RE: stdevp( ) appears to be mislabeled
(05-28-2016 04:44 AM)Tim Wessman Wrote:  If I remember correctly, Bernard has it that way on purpose and does not think it should change. I'm not sure the reasoning myself.

last year there was this thread about the question, where Bernard explained his reasoning.

IMHO stddev() should represent sX variable of Statistics 1Var and stddevp() the correspondent σX, otherwise another proposal is to rename stddevp() stddevs() ("of the sample) and in any case to adapt them to sX and σX.
However I consider also the Bernard's reasoning as stddev() and stdevp() are applied also to a matrix and so on...

Salvo

∫aL√0mic (IT9CLU) :: HP Prime 50g 41CX 71b 42s 39s 35s 12C 15C - DM42, DM41X - WP34s Prime Soft. Lib
Visit this user's website Find all posts by this user
Quote this message in a reply
05-28-2016, 01:57 PM
Post: #7
RE: stdevp( ) appears to be mislabeled
The current notational standards in statistics are that σX is the population standard deviation and sX is the sample standard deviation. The Stats 1Var app is consistent with the notational standards; and it also agrees with the STATS app on the HP 50G, the TI 89 and TI 83 and 84, and various other statistics packages.

The reason for dividing by N-1 in the sample standard deviation is because we don't have either the actual population mean or the actual population standard deviation when we take a sample. By taking a sample, we hope we have values that are representative of the population so that we can use the sample mean and standard deviation as <i>estimates</i> of the corresponding population parameters. These estimates of population parameters that we take from our sample are called statistics.

However, we lose a degree of freedom in calculating the mean of the sample because we then turn right around and use that sample mean in calculating the sample standard deviation. This makes the sample standard deviation a biased estimate of the population standard deviation unless we divide by N-1 instead of N.

When you calculate the population mean and standard deviation, you have every indivitual in the population, so there is no conflict in subtracting individual measurements from the population mean. So in calculating the population standard deviation it is proper to divide by N.

However, if you apply the calculation to get the sample standard deviation to the population itself, you get a value that is sqrt(N/(N-1)) times larger than it is supposed to be. Remember that the sample is not the population; it is hopefully a representative sample of the population from which we hope to estimate the population parameters when we can't count every member of the population.

Also, when we look at the behaviors of the sample means and sample standard deviations as a function of sample size, we find that the standard deviations of the sample means decreases with sample size as &sigma;/sqrt(N) whereas the standard deviation of the sample standard deviations decreases as &sigma;/sqrt(2N) when sampling a population that has a normal distribution.

I discovered the inconsistency in the notion of stddevp when I was writing an app to demonstrate the behaviors of the sample mean and sample standard deviation as a function of sample size. :-)

So, to be consistent with standard statistical notation, I would think that stddevp should be the same as &sigma;X and stddev (or, better, stddevs) should be the same as sX. At the moment, they are reversed.
Find all posts by this user
Quote this message in a reply
05-28-2016, 02:04 PM
Post: #8
RE: stdevp( ) appears to be mislabeled
I'm new to this site.

How does one get Greek and other mathematical symbols? HTML doesn't appear to work in my last post.
Find all posts by this user
Quote this message in a reply
05-28-2016, 02:45 PM
Post: #9
RE: stdevp( ) appears to be mislabeled
(05-28-2016 01:57 PM)Mike Elzinga Wrote:  So, to be consistent with standard statistical notation, I would think that stddevp should be the same as &sigma;X and stddev (or, better, stddevs) should be the same as sX. At the moment, they are reversed.
I don't see why stddev should correspond to sigmaX and stddevp to sX and not conversely. I find more natural to have the shortest name for the simplest definition and the longest name for the modification required to have an unbiaised estimation. That's why I want to keep stddev for division by sqrt(N). After thinking a bit about the name, stddevs might also be confusing, perhaps stddevpfroms would be better.
Find all posts by this user
Quote this message in a reply
05-28-2016, 03:15 PM
Post: #10
RE: stdevp( ) appears to be mislabeled
(05-28-2016 02:45 PM)parisse Wrote:  
(05-28-2016 01:57 PM)Mike Elzinga Wrote:  So, to be consistent with standard statistical notation, I would think that stddevp should be the same as &sigma;X and stddev (or, better, stddevs) should be the same as sX. At the moment, they are reversed.
I don't see why stddev should correspond to sigmaX and stddevp to sX and not conversely. I find more natural to have the shortest name for the simplest definition and the longest name for the modification required to have an unbiaised estimation. That's why I want to keep stddev for division by sqrt(N). After thinking a bit about the name, stddevs might also be confusing, perhaps stddevpfroms would be better.

Because this is also a pedagogical issue, the names should reflect what calculation is being done.

On other statistics packages and on the HP50G, population standard deviation has division by N and sample standard deviation has division by N-1.

Standard notation in statistics these days has sigma for the population standard deviation and s for the sample standard deviation.

The help given for stddevp on the HP Prime says it is the population standard deviation. However, it actually calculates the sample standard deviation. On the other hand, stddev calculates the population standard deviation. This is reversed from standard notation and conflicts with the sigmaX and the sX of the Statistics 1 Var app.

Some calculators and statistical packages use PopStdDev and SampStdDev to make it perfectly clear which calulation is being done. P goes with population (division by N) and S goes with sample (division by N-1).

Because I am aware of the frequent confusion about the sample standard deviation as compared with the population standard deviation, I always do a quick check with a small list of values that can be taken as either the population or a sample whenever I start using a new statistical package or app.

Calculating the sample standard deviation should always be greater than calculating the population standard deviation by a factor of sqrt(N/(N-1)) using the same small list.
Find all posts by this user
Quote this message in a reply
05-28-2016, 04:31 PM (This post was last modified: 05-28-2016 04:55 PM by salvomic.)
Post: #11
RE: stdevp( ) appears to be mislabeled
(05-28-2016 02:04 PM)Mike Elzinga Wrote:  I'm new to this site.

How does one get Greek and other mathematical symbols? HTML doesn't appear to work in my last post.

In Mac OS X I simply pasted and copied sigma (σ) from "Show symbols" panel, on the top right in the Finder bar)...

otherwise you could use LaTEX (use first \ [ without space and at the end \ ] without spaces and \sigma for the greek small sigma)...
\[ \sigma \]

If you want use it inside a paragraph (\( \sigma \)) use \ ( and \ ) instead...


Attached File(s) Thumbnail(s)
   

∫aL√0mic (IT9CLU) :: HP Prime 50g 41CX 71b 42s 39s 35s 12C 15C - DM42, DM41X - WP34s Prime Soft. Lib
Visit this user's website Find all posts by this user
Quote this message in a reply
05-28-2016, 04:54 PM
Post: #12
RE: stdevp( ) appears to be mislabeled
(05-28-2016 04:31 PM)salvomic Wrote:  
(05-28-2016 02:04 PM)Mike Elzinga Wrote:  I'm new to this site.

How does one get Greek and other mathematical symbols? HTML doesn't appear to work in my last post.

In Mac OS X I simply pasted and copied sigma (σ) from "Show symbols" panel, on the top right in the Finder bar)...

otherwise you could use LaTEX (use first \ [ without space and at the end \ ] without spaces and \sigma for the greek small sigma)...
\[ \sigma \]

If you want use it inside a paragraph (\( \sigma \)) use \ ( and \ ) instead...

Thanks. I didn't know LaTEX worked on this site.
Find all posts by this user
Quote this message in a reply
05-28-2016, 05:53 PM
Post: #13
RE: stdevp( ) appears to be mislabeled
(05-28-2016 03:15 PM)Mike Elzinga Wrote:  
(05-28-2016 02:45 PM)parisse Wrote:  I don't see why stddev should correspond to sigmaX and stddevp to sX and not conversely. I find more natural to have the shortest name for the simplest definition and the longest name for the modification required to have an unbiaised estimation. That's why I want to keep stddev for division by sqrt(N). After thinking a bit about the name, stddevs might also be confusing, perhaps stddevpfroms would be better.

Because this is also a pedagogical issue, the names should reflect what calculation is being done.
I see, in fact we disagree because HP made a mistake in the documentation. Xcas documentation says
# stddev Returns the standard deviation of the elements of its argument with an optionnal second argument as pound or the list of standard deviation of the columns of a matrix.
# stddevp Returns an unbiaised estimate of the population standard deviation of the sample (first argument) with an optionnal list of pounds as second argument.
Find all posts by this user
Quote this message in a reply
05-28-2016, 06:08 PM
Post: #14
RE: stdevp( ) appears to be mislabeled
By the way, it should not be that essential. I mean sqrt(n/n-1) is 1.017... for n=30, in other words your confidence interval will be almost the same for a reasonable sample size (interval length will increase by less than 2%). If you make a poll of size n=1000 we are talking of less than 0.05% change. Insisting too much on the unbiaised vs biaised stddev estimate difference might miss more important comprehension.
Find all posts by this user
Quote this message in a reply
05-28-2016, 07:16 PM
Post: #15
RE: stdevp( ) appears to be mislabeled
Why not just rename stddevp to stddevs - - and everyone should be happy. stddev can stay the way it is.
Find all posts by this user
Quote this message in a reply
05-28-2016, 08:12 PM
Post: #16
RE: stdevp( ) appears to be mislabeled
(05-28-2016 07:16 PM)Helge Gabert Wrote:  Why not just rename stddevp to stddevs - - and everyone should be happy. stddev can stay the way it is.

I agree; this would be the easiest fix.

Also change the help for this command to state that it calculates the sample standard deviation. In the help, just change the word "population" to sample.
Find all posts by this user
Quote this message in a reply
05-28-2016, 08:58 PM
Post: #17
RE: stdevp( ) appears to be mislabeled
(05-28-2016 06:08 PM)parisse Wrote:  By the way, it should not be that essential. I mean sqrt(n/n-1) is 1.017... for n=30, in other words your confidence interval will be almost the same for a reasonable sample size (interval length will increase by less than 2%). If you make a poll of size n=1000 we are talking of less than 0.05% change. Insisting too much on the unbiaised vs biaised stddev estimate difference might miss more important comprehension.

This is precisely what my little app is demonstrating. It is a pedagogical tool used to show where confidence intervals come from and why we use the language we do in inferential statistics.

1. First the app generates a big population of, say, 1000, with a specified normal distribution. It then checks the actual population mean and population standard deviation and plots the histogram.

2. It allows setting a sample size and takes many (on the order of 100) samples of this size and calculates the standard deviations of the sample means and the standard deviation of the sample standard deviations.

3. If desired, it can then plot a histogram of these and tell us what the standard deviation of all the sample means and standard deviation of all the sample standard deviations are. Several checks with different sample sizes show these histograms get narrower and taller as sample sizes get larger and larger, and the histograms have normal distributions.

3. It also allows incrementing the sample size by a specified amount and repeating this all the way from a small sample size up to some specified large sample size.

4. The sample sizes, the standard deviations of the sample means, and the standard deviations of the sample standard deviations are stored in lists.

5. It then plots the standard deviations of sample means versus sample size, and also the standard deviations of sample standard deviations versus sample size.

6. These plots fit very nicely sigma/sqrt(N) for the sample means, and sigma/sqrt(2N) for the sample standard deviations as expected.

The nice thing about the HP Prime is that it is so fast that it takes very little time to generate all these data and make these plots.
Find all posts by this user
Quote this message in a reply
05-29-2016, 05:57 AM
Post: #18
RE: stdevp( ) appears to be mislabeled
(05-28-2016 08:12 PM)Mike Elzinga Wrote:  
(05-28-2016 07:16 PM)Helge Gabert Wrote:  Why not just rename stddevp to stddevs - - and everyone should be happy. stddev can stay the way it is.

I agree; this would be the easiest fix.

Also change the help for this command to state that it calculates the sample standard deviation. In the help, just change the word "population" to sample.

I'm afraid stddevs is also confusing, stddevpfroms would be better.
Find all posts by this user
Quote this message in a reply
05-29-2016, 04:56 PM
Post: #19
RE: stdevp( ) appears to be mislabeled
(05-29-2016 05:57 AM)parisse Wrote:  
(05-28-2016 08:12 PM)Mike Elzinga Wrote:  I agree; this would be the easiest fix.

Also change the help for this command to state that it calculates the sample standard deviation. In the help, just change the word "population" to sample.

I'm afraid stddevs is also confusing, stddevpfroms would be better.

How about sample_estimate_of_the_population_standard_deviation ? ;-)
Find all posts by this user
Quote this message in a reply
05-29-2016, 05:08 PM (This post was last modified: 05-29-2016 05:09 PM by toml_12953.)
Post: #20
RE: stdevp( ) appears to be mislabeled
(05-29-2016 04:56 PM)Mike Elzinga Wrote:  
(05-29-2016 05:57 AM)parisse Wrote:  I'm afraid stddevs is also confusing, stddevpfroms would be better.

How about sample_estimate_of_the_population_standard_deviation ? ;-)

or even

standard_deviation_using_just_a_sample_of_the_total_population_not_the_whole_thi​ng

Smile

Tom L

Tom L
Cui bono?
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 1 Guest(s)