HP Forums
Maths/Stats challenge - 1 of 2 - polls - Printable Version

+- HP Forums (https://www.hpmuseum.org/forum)
+-- Forum: Not HP Calculators (/forum-7.html)
+--- Forum: Not remotely HP Calculators (/forum-9.html)
+--- Thread: Maths/Stats challenge - 1 of 2 - polls (/thread-18401.html)



Maths/Stats challenge - 1 of 2 - polls - EdS2 - 05-23-2022 11:01 AM

Here's a challenge. I think it's a bit like finding fractional approximations to arbitrary numbers, but different.

Given a poll result, come up with a series of ever-better approximations of how many votes might have been cast.

For example, a poll with
A: 10%
B: 20%
C: 40%
D: 30%
could plausibly have had 10 voters. In this case there's no point going further, but in any case there's a sensible limit of 100 voters for a given result.

Here's a real-world poll result:
A: 23%
B: 41%
C: 36%

And here's another real-world result:
A: 12%
B: 18%
C: 45%
D: 25%

Two more examples from the wild:
16, 2, 51, 31
45, 17, 25, 13, 0

(All of those have answers which are below 100)

(We should perhaps assume that percentages have been rounded, so we might get an approximate and not exact match.)


RE: Maths/Stats challenge - 1 of 2 - polls - robve - 05-23-2022 02:23 PM

(05-23-2022 11:01 AM)EdS2 Wrote:  Here's a challenge. I think it's a bit like finding fractional approximations to arbitrary numbers, but different.
Given a poll result, come up with a series of ever-better approximations of how many votes might have been cast.

My guess is that Euclidean method applied to 10^n upscaled percentages (to remove fractions) and sum over normalized is one possible step to get an answer, but for ever-better approximations?

- Rob


RE: Maths/Stats challenge - 1 of 2 - polls - Joe Horn - 05-23-2022 02:49 PM

This class of question was addressed in a presentation at HHC 2012:
https://www.hpmuseum.org/forum/thread-7984.html
See "Chapter III: Fun application: Opinion Polls" in the PDF linked at the bottom of that posting.

Using the 50g's XQ function to answer this sort of problem was addressed here:
https://www.hpmuseum.org/forum/thread-7991.html

Here are some possibly non-obvious results for your suggested inputs (listed here in your posting's order):

Rounded to 1 significant digit:
10% --> 1/7
30% --> 1/3

Rounded to the number of significant digits indicated:
23% --> 3/13
23.0% --> 14/61 (*)
41% --> 7/17
41.0% --> 16/39
36% --> 4/11

12% --> 2/17
18% --> 2/11
45% --> 5/11 (*)

16% --> 3/19
02% --> 1/41 (*)
51% --> 18/35 (*)
51.0% --> 25/49
31% --> 4/13
31.0% --> 9/29
17% --> 1/6
17.0% --> 8/47
13% --> 1/8
13.0% --> 3/23

(*) -- these results are missed by the usual "continued fraction" method, but are found by brute search (slow) and by the PDQ algorithm (fast) as described by the above-linked article.


RE: Maths/Stats challenge - 1 of 2 - polls - EdS2 - 05-24-2022 07:31 AM

Thanks Joe, that's some interesting material there, and thanks for linking it.

I think my case, where there are three or more poll responses, is a bit more involved than the case of just two responses. In effect, we have multiple constraints all of which need to be satisfied. I think this means we have more information and so we can get more confidence in the answers.

So, for example, when I found this example in the wild
A: 23%
B: 41%
C: 36%
the actual total number of votes was a single number, less than 50, and as it happens, none of the individual best approximations from each percentage (of 13, 17 and 11) are the right number. If we evaluate all three percentages for those three postulated number of votes, we get different results, perhaps for example:
A: 23% (3/13)
B: 38% (5/13)
C: 38% (5/13)
or
A: 24% (4/17)
B: 41% (7/17)
C: 35% (6/17)

So, I think there's more to it, to getting a best fit, or indeed the exact answer, in the case of polls with more than two candidates.

(Edit: I notice that the differences between results are a constraint too: in the case of 36% and 41%, the true difference must be at most 6%, and so we must have at least 16 votes.)


RE: Maths/Stats challenge - 1 of 2 - polls - EdS2 - 05-24-2022 07:32 AM

Rob, could you possibly do a worked example of this Euclidian method idea? I'm intrigued!


RE: Maths/Stats challenge - 1 of 2 - polls - Joe Horn - 05-24-2022 10:21 AM

(05-24-2022 07:31 AM)EdS2 Wrote:  I think my case, where there are three or more poll responses, is a bit more involved than the case of just two responses. In effect, we have multiple constraints all of which need to be satisfied.

Aha! I understand now.

For what it's worth, George Chrystal's "Algebra" (first published in 1889, available through Amazon from Dover Press, AMS Chelsea, and some other publishers) seems to cover this problem. Chapter X is called "Continued Fractions", with sub-section 10.15 called "Approximation by convergents". In Chapter XI ("Approximation of Irrationals by Rationals"), subsection 11.12 ("Simultaneous Approximation") begins:

Chrystal Wrote:So far we have been concerned with approximations to a single irrational number. ... [We now look at] the simultaneous approximation of k numbers (X1, X2 ... Xk) by fractions P1/Q, P2/Q, ... Pk/Q, with the same denominator Q (but not necessarily irreducible).

This is followed by Chrystal's typical dense mixture of English and math. I'll try to make sense out of it. Although this section is specifically about approximating irrational numbers, that's not essential to his process which uses continued fractions. If you have trouble finding this book, I can scan my copy (which is itself a printed scan) and post it here for you. In any case, it's nice to know that this problem was covered by an algebra "textbook" in 1889. Smile


RE: Maths/Stats challenge - 1 of 2 - polls - Albert Chan - 05-24-2022 01:35 PM

(05-24-2022 10:21 AM)Joe Horn Wrote:  Aha! I understand now.

For what it's worth, George Chrystal's "Algebra" (first published in 1889, available through Amazon from Dover Press, AMS Chelsea, and some other publishers) seems to cover this problem. Chapter X is called "Continued Fractions", with sub-section 10.15 called "Approximation by convergents".

Unfortunately, it is possible solutions not from the convergents (even semi-convergents)
With small search range (<100), brute force (for "roundtrip") may be the simplest solution.

Code:
function badn(n,P)
    for i=1,#P do
        if round(round(P[i]*n/100)*100/n) ~= P[i] then return true end
    end
end
Note: to reduce binary->decimal conversion errors, we "wasted" some cycles.
Note: round(), does half-way-away-from-zero. For half-way-to-even, use rint() instead.

lua> P = {16,2,51,31} -- one of EdS2's example
lua> n = ceil(100/(16-2+1)) -- = ceil(100/15) = 7
lua> while badn(n,P) do n=n+1 end
lua> n -- smallest solution
45

Had we do convergents/semi-convergents of, say, 31%, we get this:
Code:
31% 0   3   4   2   3
1   0   1   4   9  31
0   1   3  13  29 100

Checked convergents/semi-convergents denominator (<100): 13, 29, 42, 71, none will work.
We would have missed all these:

lua> for n=45,99 do
:      if not badn(n,P) then io.write(n,' ') end
:      end

45 49 51 55 61 81 83 85 86 87 88 89 90 91 93 94 95 96 97 98 99


RE: Maths/Stats challenge - 1 of 2 - polls - EdS2 - 05-24-2022 03:34 PM

Sounds like a very interesting book, Joe, so yes please, if you can spare the time to scan and share (some of) it, that would be most excellent.
Edit: I notice the two-volume 5th edition is online here and here, but it seems much expanded, with different chapter numbers, and I have not located any mention of simultaneous approximation.

Albert, it seems you are right, that convergents might miss the mark. But you've mentioned semi-convergents before and I couldn't find any reference for them - please explain, if you would!

As for the {16,2,51,31} example, it's interesting that 45 is a solution. As it happens, 97 was the number of votes in this case. But not too surprising that there could be an alias. My other examples have smaller answers, so perhaps we'll be luckier. In any case, the smallest number which survives a round-trip must be a winner.


RE: Maths/Stats challenge - 1 of 2 - polls - Albert Chan - 05-24-2022 04:55 PM

(05-24-2022 01:35 PM)Albert Chan Wrote:  Checked convergents/semi-convergents denominator (<100): 13, 29, 42, 71, none will work.

Semi-convergents generated from previous 2 convergents, ratio is "locked" between 2 edges.

4/13 ≈ 0.3077 (< 31%)
9/29 ≈ 0.3103 (> 31%)

Previous 2 convergents both rounded to 31%, so does all the semi-convergents.

(4+9)/(13+29) = 13/42 ≈ 0.3095
(13+9)/(42+29) = 22/71 ≈ 0.3099

Not all best rational approximations are the convergents of the continued fraction!


RE: Maths/Stats challenge - 1 of 2 - polls - Dan C - 05-24-2022 05:36 PM

This seems very interesting, but for a dummie like me, can you explain the basic mathematical theory behind this with a simple real world example?


RE: Maths/Stats challenge - 1 of 2 - polls - Joe Horn - 05-25-2022 03:22 AM

(05-24-2022 03:34 PM)EdS2 Wrote:  Sounds like a very interesting book, Joe, so yes please, if you can spare the time to scan and share (some of) it, that would be most excellent.

Immediately after the following, Chrystal launches into a proof that e and pi are irrational, which is not pertinent to this topic. So I'm afraid that he doesn't actually present a method for solving the desired system (unless I'm staring at his text the way that the apes stared at the monolith in 2001: A Space Odyssey).

[Image: Chrystal169.jpg]


RE: Maths/Stats challenge - 1 of 2 - polls - Thomas Klemm - 05-25-2022 06:54 AM

This Python program uses brute force to loop through all numbers from 1 to 100:
Code:
import numpy as np

def approximate(p):
    m = 10
    
    for k in range(1, 100):
        r = np.round(k * p)
        q = r / k
        d = np.linalg.norm(p - q)
        if d < m:
            n, m, u, v = k, d, r, q

    print(f"{u} / {n} ~ {np.round(v, decimals=2)}")

We can use it with the given examples:
Code:
for p in [
    [.23, .41, .36],
    [.12, .18, .45, .25],
    [.16, .02, .51, .31],
    [.45, .17, .25, .13, .0],
]:
    approximate(np.array(p))

The result is:

[14. 25. 22.] / 61 ~ [0.23 0.41 0.36]
[10. 15. 38. 21.] / 84 ~ [0.12 0.18 0.45 0.25]
[15. 2. 48. 29.] / 94 ~ [0.16 0.02 0.51 0.31]
[34. 13. 19. 10. 0.] / 76 ~ [0.45 0.17 0.25 0.13 0. ]



RE: Maths/Stats challenge - 1 of 2 - polls - Joe Horn - 05-25-2022 10:39 AM

(05-25-2022 06:54 AM)Thomas Klemm Wrote:  The result is:

[14. 25. 22.] / 61 ~ [0.23 0.41 0.36]
[10. 15. 38. 21.] / 84 ~ [0.12 0.18 0.45 0.25]
[15. 2. 48. 29.] / 94 ~ [0.16 0.02 0.51 0.31]
[34. 13. 19. 10. 0.] / 76 ~ [0.45 0.17 0.25 0.13 0. ]

My 50g program finds these smaller denominator solutions:

[5. 9. 8.] / 22 ~ [0.23 0.41 0.36]
[6. 9. 23. 13.] / 51 ~ [0.12 0.18 0.45 0.25]
[7. 1. 23. 14.] / 45 ~ [0.16 0.02 0.51 0.31]
[24. 9. 13. 7. 0.] / 53 ~ [0.45 0.17 0.25 0.13 0. ]

Program:
<< 0 DO 1 + UNTIL DUP2 * 0 RND OVER / 2 RND PICK3 SAME END DUP2 * 0 RND SWAP >>

Input: List of percentages as decimals, e.g. { .23 .41 .36 }
Output:
3: Input list
2: List of numerators
1: Denominator


RE: Maths/Stats challenge - 1 of 2 - polls - EdS2 - 05-25-2022 11:01 AM

Well done Joe - I think the lowest answer which survives the round-trip is the way to go, and it's rather looking like an exhaustive search is the way to do that.

My real-world examples, as it happens, only agree with one out of four of your answers, but that would indicate a failing in the design of the challenge!

The one you got "right" was 22. The other "correct" answers would be 67, 97, 64. I'm wondering if the aliasing problem would only afflict polls with more than 50 votes, a restriction which didn't occur to me.

Hopefully the "correct" answers will also survive the round trip, and would be discovered by other approaches as good candidates, even if not selected. (I think Thomas' approach is to select the closest match, not the lowest round-trip survivor.)

Edit: Oh, and thanks for the scan! I need to study that.


RE: Maths/Stats challenge - 1 of 2 - polls - Albert Chan - 05-25-2022 01:29 PM

(05-25-2022 10:39 AM)Joe Horn Wrote:  My 50g program finds these smaller denominator solutions:

[6. 9. 23. 13.] / 51 ~ [0.12 0.18 0.45 0.25]
It depends on what rounding mode get used.
For half-way up, n=51 is correct solution

However, for bankers rounding, we can have smaller n = 40

[5, 7, 18, 10] / 40 = [0.125, 0.175, 0.45, 0.25] (exact)

I was a bit surprised that HP50g, numerically use banker's rounding, have RND do half-way up.
HP50g user manual is unclear on how RND handle half-way case.
I would have guessed internally RND just "pushed out" excess digits, like this:

0.125 5E9 + 5E9 -      → 0.12


RE: Maths/Stats challenge - 1 of 2 - polls - Albert Chan - 05-25-2022 01:56 PM

(05-25-2022 06:54 AM)Thomas Klemm Wrote:  
Code:
...
    print(f"{u} / {n} ~ {np.round(v, decimals=2)}")

Off topics, but important when using Python to code.

I was surprised that (u,n,v), never defined outside for loop, get "leak" outside the loop.
Turns out this was one of Python's quirk: All variables inside loop get leaked out!

https://eli.thegreenplace.net/2015/the-scope-of-index-variables-in-pythons-for-loops/
Quote:The important point here is: the innermost possible scope is a function body.

Not a for loop body. Not a with block body.

Python does not have nested lexical scopes below the level of a function,
unlike some other languages (C and its progeny, for example).



RE: Maths/Stats challenge - 1 of 2 - polls - John Keith - 05-25-2022 06:57 PM

(05-25-2022 01:29 PM)Albert Chan Wrote:  It depends on what rounding mode get used.
For half-way up, n=51 is correct solution



I was a bit surprised that HP50g, numerically use banker's rounding, have RND do half-way up.
HP50g user manual is unclear on how RND handle half-way case.
I would have guessed internally RND just "pushed out" excess digits, like this:

0.125 5E9 + 5E9 -      → 0.12

What you are describing is Truncate, not Round. RND rounds to nearest digit, so .125 2 RND returns .13, .6666666 3 RND returns .667 etc.


RE: Maths/Stats challenge - 1 of 2 - polls - Albert Chan - 05-25-2022 08:11 PM

(05-25-2022 06:57 PM)John Keith Wrote:  
(05-25-2022 01:29 PM)Albert Chan Wrote:  0.125 5E9 + 5E9 -      → 0.12

What you are describing is Truncate, not Round. RND rounds to nearest digit, so .125 2 RND returns .13

It is not trucation, just seems like it, because last kept digit, 2 is even.
Adding and subtract the big number forced system to use its current rounding mode.
Banker's rounding in this case.

>B=5E9
>.125+B-B, .175+B-B
  .12            .18


RE: Maths/Stats challenge - 1 of 2 - polls - Albert Chan - 05-26-2022 12:01 PM

Perhaps it is safer to avoid solutions that depends on how half-way cases is handled.

Starting from roundtrip condition:

round(round(P[i]*n/100)*100/n) == P[i]                  // assumed integer P[i], unit = %

→ abs(round(P[i]*n/100)*100/n - P[i]) < 0.5           // half-way solutions removed
→ abs(round(P[i]*n/100)*100 - P[i]*n)*2 < n          // multiply by 2n
→ abs(smod(P[i]*n, 100))*2 < n                             // smod = Euclidean symmetric remainder

If n=100, LHS=0 → always roundtrip.
If n>100, max(LHS)=100 < n → always roundtrip.

We flip roundtrip condition for badn:
Code:
function badn(n,P)  -- removed half-way solutions
    for i=1,#P do
        if abs((P[i]*n+50)%100-50)*2 >= n then return true end
    end
end

function firstn(P)  -- assume P table of integers, unit=%
    for n=1,99 do   -- search non-trivial solution only
        if not badn(n,P) then return n end
    end
end

(05-25-2022 10:39 AM)Joe Horn Wrote:  My 50g program finds these smaller denominator solutions:

[5. 9. 8.] / 22 ~ [0.23 0.41 0.36]
[6. 9. 23. 13.] / 51 ~ [0.12 0.18 0.45 0.25]
[7. 1. 23. 14.] / 45 ~ [0.16 0.02 0.51 0.31]
[24. 9. 13. 7. 0.] / 53 ~ [0.45 0.17 0.25 0.13 0. ]

lua> firstn {23, 41, 36}
22
lua> firstn {12, 18, 45, 25}
51
lua> firstn {16, 2, 51, 31}
45
lua> firstn {45, 17, 25, 13, 0}
53

This matched Joe Horns answer, implied solutions does not have any half-way cases.