HP Forums

Full Version: (12C) Binary Outcome
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Suppose you want to test whether more people respond to one drug versus another, or whether one advertising campaign is more effective than another. In either case, you have a binary outcome. Someone either responds to the drug or they don't. They either buy the product or they don't.

In either case you have a probability of something happening, P1 for one group and P2 for the other , and you would like to test whether the two probability are different enough to tell apart that their difference is statistically significant.

If you are designing an experiment, how many people should use in each group?

This program estimate n, the number of subjects you need to assign to each, based on you initial guesses at p1 and p2

Program:
Code:

01 STO 1
02 Rv
03 STO 2
04 RCL 1
05 RCL 2
06  +
07  2
08  /
09 STO 3
10  1
11 ENTER
12 RCL 3
13  -
14 RCL 3
15  x
16  1
17  6
18  x
19 RCL 1
20 RCL 2
21  -
22 ENTER
23  x
24  /
25 FIX 0

Example:
Suppose p1 = 0.1 and p2 = 0.3

.1 ENTER .3 R/S > 64 with two group total > 2 x > 128

You need 64 subjects per group.
n is the number in each group, so the total needed is 2n then total for both group is 128

Note that this is only a rough estimate good for quick estimation.

Gamo
(04-07-2018 08:49 AM)Gamo Wrote: [ -> ]Suppose you want to test whether more people respond to one drug versus another, or whether one advertising campaign is more effective than another. In either case, you have a binary outcome. Someone either responds to the drug or they don't. They either buy the product or they don't.

Gamo, you do not say anything about the statistics behind this program. But as far as I can tell this calculates the (equal) sample size for a significance test that tests if p1 and p2 are equal. Your formula includes the constant 16. This value is equal to 2 z² where z is the quantile for which the Normal distribution CDF equals the desired significance level 1–α. A value of 16 here means a z of 2,828 which is equivalent to an α-error of merely 0,47%. For a one-sided test (p1<p2 or p1>p2) it's even 0,23%. This is extremely restrictive. Common values are α=5% or maybe 1%. That's why I suggest replacing the 16 with another value:

For 1–α = 90%:    5,411
For 1–α = 95%:    7,683
For 1–α = 98%:  10,824
For 1–α = 99%:  13,270

I'd suggest a value like 10. This means 1–α = 97,5% for a two-sided test and 1–α = 98,7% for a single-sided test.

Regarding the program: the formula is so simple that one certainly does not need three data registers and 25 steps. Take a look at the following version and you will find one or two small tricks that make the program more effective.

Code:
01 STO 1
02 X<>Y
03 STO-1
04 +
05 2
06 /
07 1
08 X<>Y
09 -
10 LstX
11 x
12 1
13 0
14 x
15 RCL 1
16 ENTER
17 x
18 /
19 FIX 0

BTW, as far as I remember the underlying Normal distribution can only be used because of the central limit theorem, which requires a sufficiently large sample size. A common rule of thumb here is n · p · (1–p) > 9. Which does not apply for your example. Please correct me if I'm wrong – the last time I manually calculated such things is about 30 years ago. ;-)

Dieter
Here is more information about the formula used.
Note that this is only a rough estimate.

The rule of thumb is based on the assumption of significance α = 0.05 and type II error β = 0.20, i.e. 80% power.

formula:

n = 16P(1-P) / (p1 - p2)^2

where P = p1 + p2 / 2

Thank You Dieter for more indept information with better program alternative. Smile

Gamo
(04-08-2018 02:58 AM)Gamo Wrote: [ -> ]The rule of thumb is based on the assumption of significance α = 0.05 and type II error β = 0.20, i.e. 80% power.

formula:

n = 16P(1-P) / (p1 - p2)^2

where P = p1 + p2 / 2

Where P = (p1 + p2) / 2

Yes, that's the formula in the program. But where do you get the 16 from? For α = 0,05 I think it should be 2 · 1,96² = 7,68. And how do you factor in the β error? Maybe you can deduct the formula step by step – it's been a long time since I handled such things, and I am willing to learn. :-)

So, where does the 16 come from?

Dieter
OK. I'll answer my own question. ;-)

(04-08-2018 07:39 AM)Dieter Wrote: [ -> ]Yes, that's the formula in the program. But where do you get the 16 from? For α = 0,05 I think it should be 2 · 1,96² = 7,68. And how do you factor in the β error?
...
So, where does the 16 come from?

I guess I found something that explains what is going on. Take a look at this website.

The formula given there is said to consider both α and β error. In the numerator it adds the z-values for both, i.e. z = zα+zβ. For α=0,05 and β=0,2 this leads to a constant of 2 · (1,96+0,84)² = 15,7 which almost agrees with the 16 given in the first post.

But there is another difference in the formulas: the website calculates p1(1–p1) + p2(1–p2) while the formula in the above programs effectively evaluates p(1–p) + p(1–p) where p is the mean of p1 and p2. As long as p1 is not too different from p2 (or 1–p2) the results are comparable. In other cases they disagree. I guess (!) that the formula on the website is the (more) correct one, and so I think it should also be used in the program:

Code:
01 STO 0
02 1
03 X<>Y
04 -
05 LstX
06 x
07 X<>Y
08 STO-0
09 1
10 X<>Y
11 -
12 LstX
13 x
14 +
15 RCL 0
16 ENTER
17 x
18 /
19 8
20 x
21 FIX 0

Note that the constant here is half that of the original program, i.e. 8 instead of 16. It doesn't make much of a difference, but for the exact solution replace the last three lines with...

Code:
19 7
20 ,
21 8
22 4
23 9
24 x
25 INTG
26 1
27 +

The constant is the square of Φ–1(1–0,05/2) + Φ–1(1–0,20) = 7,848879734349...
So the returned sample size is valid for α=0,05 and β=0,20.

Final Caveat: The Null hypothesis still is H0: p1=p2. So this does not (!) test whether one proportion is smaller or larger than the other, it just tests if they are different. This is an important detail as the initial post says:

Quote:Suppose you want to test whether more people respond to one drug versus another, or whether one advertising campaign is more effective than another.

Testing this would require a different constant based on zα of a single sided Normal CDF. Replace the constant with 6,183 if this is what you want to calculate.

BTW, in cases like these I really love the WP34s and 31s for their convenient and accurate cdf and quantile functions. ;-)

Dieter
Hello Dieter
Thanks for extended information. I happen to get the formula from an article I read on magazine then I implement that formula to program on 12C it state that this formula is a rough solution through.
This give me much more indept information on this topic.

Gamo
(04-09-2018 02:49 PM)Gamo Wrote: [ -> ]… get the formula from an article I read on magazine … this formula is a rough solution …
Do you recall the Magazine …
a) title
b) year/month of publication
c) volume #
d) etc …?

BEST!
SlideRule
Reference URL's