Threaded Mode | Linear Mode

parisse · 05-29-2016, 06:00 PM

Well, I seriously think stddevs could also be confusing, perhaps not for you but for someone else, while stddevpfroms is not, it's not that large (on the Prime, you will probably get it from the catalog or with help completion and on Xcas you have tab completion). Moreover stddevp will remain a natural short synonym, at least in Xcas.

Mike Elzinga · 05-29-2016, 07:52 PM

(05-29-2016 06:00 PM)parisse Wrote: Well, I seriously think stddevs could also be confusing, perhaps not for you but for someone else, while stddevpfroms is not, it's not that large (on the Prime, you will probably get it from the catalog or with help completion and on Xcas you have tab completion). Moreover stddevp will remain a natural short synonym, at least in Xcas.

I am not as concerned about the name; only about what the function or command actually does.

In that regard, I have always advised students to check and be sure they understand what a calculation is actually doing despite what the function name or its help documentation says. Once they understand that, then the name is not as important.

I am well aware of the fact that naming standards are hard to maintain in a fast-changing tecnological world; especially with technology that is introduced to an international community. Documentation for nearly everything seems to lag these days. Too much to do and too little time.

mark4flies · 06-01-2016, 02:49 PM

I started the original thread cited above. I was likewise confused but as soon as it was pointed out what each function was for, I was fine. My confusion came from the correct but vague explanations in the documentation.

I hate to make unnecessary and radical changes when a simpler, more direct solution exists. Perhaps with the given names the User Guide and online Help clarify their intended use and computation.

If you start changing names, you start breaking programs and other goodies.

I vote for clarification.

Wes Loewer · 06-04-2016, 04:05 PM

(05-28-2016 07:11 AM)parisse Wrote: stddevp = standard deviation of the population based on a sample
stddev = standard deviation
If this is too confusing, I have nothing against renaming stddevp to stddevs.

If I understand correctly, you are saying:
stddevp = standard deviation of the population based on a sample
stddev = standard deviation of the population based on the population

Since spreadsheets like Excel, Google Sheet, & OpenOffice use STDEV (from sample) and STDEVP (from population), the use of stddevp and stddev respectively is understandably confusing. (I know I had them backwards.) A few versions ago, Excel switched to STDEV.S and STDEV.P to avoid any ambiguity but kept the old names for compatibility.

Using stddevpfroms (and stddevpfromp?) seems like overkill since the "pfrom" is implied for both cases. Using stddevs and stddevp would have been nice, but as stddevp already has the reverse meaning, that's not feasible.

How about stddevsamp() and stddevpop() ? They're unambiguous and not too long.

(For what it's worth, I just checked the Nspire. It uses stDevSamp(), stDevPop().)

Wes Loewer · 06-04-2016, 04:55 PM

(06-04-2016 04:05 PM)Wes Loewer Wrote: How about stddevsamp() and stddevpop()

Or stddev_samp and stddev_pop might be even better.

parisse · 06-05-2016, 06:35 AM

I don't understand why someone would say standard deviation of population from population. Dividing by N is the normal definition of standard deviation, the one that should obviously be learned/teached first. Dividing by N-1 is a refinement, it requires more advanced students to be justified (and it does not have much impact if samples are of reasonable size). That justifies a shorter name for the straightforward definition and a longer one for the more advanced.
Excel and other spreadsheets may have different views, they are not math softwares, people using them are not expected to understand what standard deviation is.

Wes Loewer · 06-05-2016, 05:34 PM

(06-05-2016 06:35 AM)parisse Wrote: I don't understand why someone would say standard deviation of population from population.

Agreed. I just wanted to make sure I correctly understood. I guess I've never seen the expression "standard deviation of the population based on a sample." The "of the population" is usually implied, so I wanted to make sure I wasn't misunderstanding anything. In American texts, it's usually referred to as the "standard deviation of a sample" or just "sample standard deviation".

Quote:Excel and other spreadsheets may have different views, they are not math softwares, people using them are not expected to understand what standard deviation is.

I suspect that other products like the 50g and Excel chose the shorter name for the sample version since that is the one that is more commonly needed. In fact, on the TI-84+ and 89, there is not even a stand alone function for the population standard deviation, only the sample standard deviation, stdDev().

In any case, removing ambiguity is always a good idea. No one could possibly be confused by names like stddev_samp and stddev_pop.

parisse · 06-05-2016, 07:30 PM

(06-05-2016 05:34 PM)Wes Loewer Wrote: In fact, on the TI-84+ and 89, there is not even a stand alone function for the population standard deviation, only the sample standard deviation, stdDev().

Really unbelievable. For me it means that the teachers who advise TI do not want to teach math understanding.

Mike Elzinga · 06-05-2016, 08:59 PM

The sample estimate of the population standard deviation - i.e., the one we are discussion that has division by N-1 - is only one way to estimate the popultion standard deviation with a statistic taken from a sample of that population.

https://en.wikipedia.org/wiki/Standard_d..._deviation is a Wikipedia article that has a pretty good summary of the various ways one estimates the population standard deviation using a sample of that population.

https://en.wikipedia.org/wiki/Notation_i...statistics is a reference for some of the standard notation in statistics.

I have taught statistics for quite a number of years; and the notation has become increasingly standardized. For the introductory inferential statistics courses, the sample standard deviation has division by N-1 and the population standard deviation has division by N. There is no ambiguity here. One doesn't always have the option of taking large samples and has to make the best of the data one can gather.

Most statistics programs are moving toward having both a sample standard deviation, which has division by N-1, and a population standard deviation which has division by N. The HP Prime Statistics 1 Var application has the notation correct; it uses sigma for the population standard deviation and sX for the sample standard deviation. If you check, you will find that sigma*sqrt(N/(N-1) = sX.

Different platforms and statistical packages have slightly different names for each of these; but the differences are made clear in the help menus. The sample estimate of the population standard deviation - whatever you want to call it - has division by N-1. The population standard deviation - the one you get by actually counting every member of the population - has division by N.

The sample estimate is just that; and estimate of the population standard deviation that makes use of the data in your sample. If you need a better estimate, take larger samples if you can. But if you cannot, you have to divide by N-1 or use another correction to your sample data.

Statistics courses are also emphasizing the difference between the descriptive PARAMETERS of a population and the corresponding sample STATISTICS that are attempting to estimate those population parameters.

Sample statistics are estimates of population parameters. Confidence intervals and the probability calculations for the various null hypotheses are concepts that emerge from the behaviors of sample statistics as sample sizes become larger and larger.

For example, if samples are being taken from a population with a normal distribution, then the standard deviation of the sample means decreases as sigma/sqrt(N) and the standard deviation of the sample standard deviations decreases as sigma/sqrt(2N).

Most good statistics courses these days have access to really nice videos that show these properties of increasing sample sizes; and these videos are crucial to teaching the concepts that lie behind the process of good sampling and using samples to calculate the probabilites that one has captured the population parameters with the sample statistics.

So the bottom line is that statistical packages should make these distinctions between population parameters and sample statistics very clear. It is an important pedagogical issue that stresses the importance of the process of sampling that keeps the focus on trying to get samples that are truly representative of the population.

Mike Elzinga · 06-05-2016, 09:57 PM

(06-05-2016 05:34 PM)Wes Loewer Wrote:
(06-05-2016 06:35 AM)parisse Wrote: I don't understand why someone would say standard deviation of population from population.

Agreed. I just wanted to make sure I correctly understood. I guess I've never seen the expression "standard deviation of the population based on a sample." The "of the population" is usually implied, so I wanted to make sure I wasn't misunderstanding anything. In American texts, it's usually referred to as the "standard deviation of a sample" or just "sample standard deviation".

Quote:Excel and other spreadsheets may have different views, they are not math softwares, people using them are not expected to understand what standard deviation is.

I suspect that other products like the 50g and Excel chose the shorter name for the sample version since that is the one that is more commonly needed. In fact, on the TI-84+ and 89, there is not even a stand alone function for the population standard deviation, only the sample standard deviation, stdDev().

In any case, removing ambiguity is always a good idea. No one could possibly be confused by names like stddev_samp and stddev_pop.

I have a TI-89 Titanium; and I just checked.

The TI-89 has stdDev (division by N-1) and stDevPop (division by N).

Excel 2016 has several forms of standard deviation; including STDEV.S and STDEV.P

The statistics package on Excel is fairly good.

Wes Loewer · 06-05-2016, 10:34 PM

(06-05-2016 09:57 PM)Mike Elzinga Wrote: I have a TI-89 Titanium; and I just checked.
The TI-89 has stdDev (division by N-1) and stDevPop (division by N).

Interesting. Apparently stDevPop() was added to the 89 in OS 3.10, but was not present in 2.09. (The function could be added by installing the Stats App.)

And just to clarify in case anybody's wondering, on the 84+, you can calculate the population standard deviation using 1-Var Stats, but there's not a population standard deviation "function" that corresponds to the sample standard deviation function, stdDev().

Mike Elzinga · 06-06-2016, 02:18 AM

(06-05-2016 10:34 PM)Wes Loewer Wrote:
(06-05-2016 09:57 PM)Mike Elzinga Wrote: I have a TI-89 Titanium; and I just checked.
The TI-89 has stdDev (division by N-1) and stDevPop (division by N).

Interesting. Apparently stDevPop() was added to the 89 in OS 3.10, but was not present in 2.09. (The function could be added by installing the Stats App.)

And just to clarify in case anybody's wondering, on the 84+, you can calculate the population standard deviation using 1-Var Stats, but there's not a population standard deviation "function" that corresponds to the sample standard deviation function, stdDev().

I have suggested to students to think of these mathematical functions as divided into two major categories;

(1) a category of functions that are used to describe and characterize a population. when we can do so, by calculating various population parameters such as the mean, standard deviation, median, quartiles, minimum, maximum, kurtosis, and skewness, etc.;

and

(2) another set of similar functions used to calculate sample statistics that are estimates of those parameters and also produce the probabilities that they wrong in their estimates.

The behaviors of these latter functions are that they deviate from their estimate of population parameters less and less as the sample sizes get larger and larger. The distributions followed by repeated samples of a given size allow us to come up with those probabilities that we are wrong.

We typically start with the z test and move to the Student's t-test; and from there the pattern of estimation remains pretty much based on the behaviors with sample size that have been observed with these simpler of the statistical inferences from the sample to the population.

This is where the naming of functions - or at the very least, the descriptions of them in the help menus - becomes a matter of good pedagogical practice in getting students to understand what they are doing when using a statistic to get at a population parameter; or even further in why the sample sizes can quantitatively set the boundaries of certainty or uncertainty in a particular measurement.

parisse · 06-06-2016, 06:23 AM

This is inferential statistics, which is advanced statistics. Before teaching that, I'm strongly convinced that one should teach descriptive statistics, including of course mean and standard deviation of a statistical serie (like for example the grades of students or the height...). And here it is of course divided by N and it is straightforward to explain why, while it is not for the unbiaised estimator from a sample (and in general inferential statistics is much more complicated to understand than descriptive statistics and I don't think you can understand inferential statistics if you don't have already well mastered descriptive statistics). That's why I'm convinced that the good name in a math package (especially in a calc) should be the shortest if one divides by N (i.e. stddev) and I won't change that.
Of course I agree that the documentation or even the menu items should make the distinction the clearest possible, I think it's clear in Xcas, but this does not depend on me on the Prime.

Mike Elzinga · 06-06-2016, 07:35 AM

(06-06-2016 06:23 AM)parisse Wrote: This is inferential statistics, which is advanced statistics. Before teaching that, I'm strongly convinced that one should teach descriptive statistics, including of course mean and standard deviation of a statistical serie (like for example the grades of students or the height...). And here it is of course divided by N and it is straightforward to explain why, while it is not for the unbiaised estimator from a sample (and in general inferential statistics is much more complicated to understand than descriptive statistics and I don't think you can understand inferential statistics if you don't have already well mastered descriptive statistics). That's why I'm convinced that the good name in a math package (especially in a calc) should be the shortest if one divides by N (i.e. stddev) and I won't change that.
Of course I agree that the documentation or even the menu items should make the distinction the clearest possible, I think it's clear in Xcas, but this does not depend on me on the Prime.

The documentation for stddevp is very explicit; it says, "Returns the population standard deviation of the elements of a list or vector, ..."

However it uses the calculation for the sample standard deviation; it divides by N-1.

The stddev help says, "Returns the standard deviation of the elements of a list or vector, ..." It divides by N; so it is what we normally call the population standard deviation. but the help documentation doesn't clarify that fact.

If you kept the names but changed what they did - namely, divide by N-1 in stddev and by N in stddevp - then the only other change you would need to make is to add the word "sample" to the help documentation for stddev and have it say explicitly, "Returns the sample standard deviation of the elements of a list or vector, ..."

This would at least make the notation consistent with the help documentation and with the standards that others are using.

Descriptive statistics and inferential statistics are typically taught as a two-semester sequence at universities in the US. In fact, the Advanced Placement Statistics taught in high schools in the US is just such a two-semester, university-level sequence. We actually get into these issues very early in these introductory courses; so they are not considered so "advanced" as you seem to suggest. The distinctions between discriptive parameter calculations and sample statistics used to infer population parameters are addressed very early on in these courses.

If it is the speed of parsing instructions that you are worried about, then one should make names that differ up front, like popstddev and sampstdev; or pstddev and sstddev. I didn't discover that stddevp calculated with N-1 until I started using it and noticed something was not right. The help documentation is currently misleading. Better to make the calculation do what the help says it does.

Obviously it is not my call in deciding what HP does with its documentation and its names for functions. All I can do is turn to tools that make the teaching of subject matter easier.

parisse · 06-06-2016, 12:54 PM

I'm not responsible for HP documentation. Xcas documentation is correct and accurate.

parisse · 06-06-2016, 02:14 PM

(06-06-2016 01:31 PM)compsystems Wrote: The usual excuse, I am not responsible for X thing, we how to improve the product

If you had thinked a little bit before you typed, you would come to the conclusion that I'm *really* not responsible for HP documentation.

Tim Wessman · (This post was last modified: 06-06-2016 04:08 PM by Tim Wessman.)

(06-06-2016 07:35 AM)Mike Elzinga Wrote: This would at least make the notation consistent with the help documentation and with the standards that others are using.

Well, as this thread has shown there is a ton of confusion around which command did what - specifically because xcas uses "p" to mean sample (unlike *every other* stat package in existence apparently). When the documentation was being made I suspect the person making it incorrectly assumed that the xcas commands followed the standard convention, or else it was correct and then later during review of the documentation someone else switched it assuming it was a mistake due to the "p" and knowledge of that convention.

So yes, the HP documentation is incorrect as written. That can be changed.

Were I to be completely in charge here though, the better solution would be:

1. Fix the documentation for the commands stddevp/stddev.
2. Disable those two commands from the catalog (they would still function if typed, but not appear in the UI anywhere except if you specifically pulled up help on them)
3. Replace them with stddevpop (does the current stddev) and stddevsamp (does the current stddevp) in order to match the convention used everywhere except xcas.
4. Those two commands would appear in the catalog and people would be directed towards using that instead in the future in natural way.

d b · 06-06-2016, 04:50 PM

Great solution set Tim.

debrouxl · 06-07-2016, 05:28 AM

Yup, Mike Elzinga beat me to writing that stDevPop was one of the CAS additions of AMS 3.10 for the 89T and V200. Other additions can be seen at https://debrouxl.github.io/gcc4ti/estack.html#ExtTags .

parisse · 06-07-2016, 06:50 AM

I'm ok with 1, 2 and 4 but still find the names stddevpop and stddevsamp confusing. Not for someone used to statistics but for someone who is learning, because he could as well think that stddevpop is an abbreviated name for standard deviation of population deduced from a sample. I must think myself when I see the names before I conclude divide by N or N-1. That's why I proposed stddevpfroms. And I'm afraid the confusion will persist if stddevsamp becomes a synonym of stddevp because the first letter after stddev is not the same. The only thing I'm sure is that the doc should be updated as soon as possible, but I'm afraid that at this point there is no good solution, whatever we will choose will remain confusing for some users.