(25C) Sampling without repetition
04-27-2017, 04:07 AM
Post: #1
 nsg Member Posts: 60 Joined: Dec 2013
(25C) Sampling without repetition
This program generates random sample of k elements out of total set of N elements. Element can be used only once. All such subsets are drawn with equal probabilitites (assuming RNG is fair).

Algorithm considers each integer i from 0 to N-1 and decides on each whether to include it in a sample or not with probability (k-done)/(N-i) where done is the number of already selected items.
The generated elements are not stored, so k and N can be any positive integers, however this algorithm is only practical with relatively small N.

Instruction:
<N> sto 0
<k> sto 1
<0.random_seed> sto 5
f prgm r/s
Output: first element of a sample 0..N-1
r/s
Output: next element of a sample.
When no more elements outputs -1
r/s after that starts a new sample.

Code:
 01. 0 02. sto 2 03. sto 3 04. rcl 1 05. rcl 3 06. - 07. x=0 08. gto 30 09. rcl 0 10. rcl 2 11. - 12. / 13. rcl 5 14. 1 15. 1 16. * 17. pi 18. + 19. frac 20. sto 5 21. x>=y 22. gto 27 23. 1 24. sto+3 25. rcl 2 26. stop 27. 1 28. sto+2 29. gto 04 30. 1 31. chs 32. gto 00

Example:
100 sto 0
10 sto 1
0.1234 sto 5
f prgm r/s
[2] r/s [4] r/s [23] r/s [30] r/s [50] r/s
[53] r/s [57] r/s [59] r/s [66] r/s [96] r/s
[-1] -- indicates end of sample

I wonder if it is possible to improve on the algorithm and generate a sample of modest k for arbitrary N, for example, a sample of 30 out of 10000000.
10-27-2018, 03:52 PM
Post: #2
 Thomas Klemm Senior Member Posts: 1,447 Joined: Dec 2013
(42S) Sampling without repetition
Here's a translation of your program for the HP-42S:
Code:
00 { 41-Byte Prgm } 01▸LBL "SAMPLE" 02 RCL 00 03 STO 02 04 RCL 01 05 STO 03 06▸LBL 00 07 RCL 03 08 RCL 02 09 ÷ 10 RAN 11 X≥Y? 12 GTO 01 13 RCL 02 14 STOP 15 DSE 03 16 GTO 01 17 GTO 02 18▸LBL 01 19 DSE 02 20 GTO 00 21▸LBL 02 22 -1 23 END

However, I reversed the order so I could use DSE.
Thus the generated samples appear in descending order.

Example:

100
STO 00
10
STO 01
XEQ "SAMPLE"
85
R/S
64
R/S
62
R/S
59
R/S
55
R/S
53
R/S
39
R/S
32
R/S
30
R/S
11
R/S
-1

(04-27-2017 04:07 AM)nsg Wrote:  I wonder if it is possible to improve on the algorithm and generate a sample of modest k for arbitrary N, for example, a sample of 30 out of 10000000.

In this case we might simply generate them at random and hope there's no collision.

Kind regards
Thomas
10-28-2018, 09:18 AM (This post was last modified: 10-28-2018 09:20 AM by Dieter.)
Post: #3
 Dieter Senior Member Posts: 2,397 Joined: Dec 2013
RE: (25C) Sampling without repetition
(10-27-2018 03:52 PM)Thomas Klemm Wrote:  Here's a translation of your program for the HP-42S:
Code:
... 15 DSE 03 16 GTO 01 17 GTO 02 18▸LBL 01 ...

One more case where something like INV DSZ would be useful (greetings to the TI community ;-)). This can be accomplished quite easily by having the DSE followed by a test that always tests false. Since X holds a positive number at this point, a simple x<0? will do.

Code:
... 15 DSE 03 16 X<0? 17 GTO 02 ...

Slightly shorter, slightly faster, saves one label.
Edit: OK, in this special case it's neither shorter nor label-saving, simply because LBL 01 is required anyway. #-)

(10-27-2018 03:52 PM)Thomas Klemm Wrote:
(04-27-2017 04:07 AM)nsg Wrote:  I wonder if it is possible to improve on the algorithm and generate a sample of modest k for arbitrary N, for example, a sample of 30 out of 10000000.
In this case we might simply generate them at random and hope there's no collision.

In such a case there is no significant difference between sampling with or without repetition. So it doesn't matter much if a number is drawn twice. Otherwise the idea is: record the sample in a (sorted) list, generate the next number, see if it's part of the list. If yes: get a new number, if not: insert it into the list. Hmmm... somehow this sounds like a job for the HP50. ;-)

Dieter
10-28-2018, 02:18 PM (This post was last modified: 10-28-2018 04:52 PM by Albert Chan.)
Post: #4
 Albert Chan Senior Member Posts: 1,676 Joined: Jul 2018
RE: (25C) Sampling without repetition
(04-27-2017 04:07 AM)nsg Wrote:  I wonder if it is possible to improve on the algorithm and generate a sample of modest k for arbitrary N,
for example, a sample of 30 out of 10000000.

Code:
import random def randindex(k, n):     s = [n] * k             # k samples     for i in range(k):         r = random.randrange(n-i)         j = 0         while r + j >= s[j]: j = j+1         s[j+1:i+1] = s[j:i] # make room for s[j]         s[j] = r + j        # sorted insert     return s

Above code only need k random numbers, and return sorted samples.

>>> randindex(10,10) # no repeat
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> random.seed(1)
>>> randindex(30,10000000)
[21060, 254459, 283474, 290410, 305901, 938595 ...
10-28-2018, 03:42 PM
Post: #5
 Thomas Klemm Senior Member Posts: 1,447 Joined: Dec 2013
RE: (25C) Sampling without repetition
(10-28-2018 02:18 PM)Albert Chan Wrote:  Above code only need k random numbers, and return sorted samples.

Writing a program for the HP-25 using only 8 registers and 49 programming steps is left as an exercise for the dear reader.

Cheers
Thomas
10-28-2018, 08:03 PM (This post was last modified: 10-28-2018 08:04 PM by John Keith.)
Post: #6
 John Keith Senior Member Posts: 739 Joined: Dec 2013
RE: (25C) Sampling without repetition
(10-28-2018 09:18 AM)Dieter Wrote:  In such a case there is no significant difference between sampling with or without repetition. So it doesn't matter much if a number is drawn twice. Otherwise the idea is: record the sample in a (sorted) list, generate the next number, see if it's part of the list. If yes: get a new number, if not: insert it into the list. Hmmm... somehow this sounds like a job for the HP50. ;-)

Going further of-topic...
This would be a variation of the Fisher-Yates shuffle, implemented in the ListExt Library as LSHUF.
10-28-2018, 08:51 PM (This post was last modified: 10-28-2018 08:52 PM by Albert Chan.)
Post: #7
 Albert Chan Senior Member Posts: 1,676 Joined: Jul 2018
RE: (25C) Sampling without repetition
(10-28-2018 08:03 PM)John Keith Wrote:
(10-28-2018 09:18 AM)Dieter Wrote:  In such a case there is no significant difference between sampling with or without repetition. So it doesn't matter much if a number is drawn twice. Otherwise the idea is: record the sample in a (sorted) list, generate the next number, see if it's part of the list. If yes: get a new number, if not: insert it into the list. Hmmm... somehow this sounds like a job for the HP50. ;-)

This would be a variation of the Fisher-Yates shuffle, implemented in the ListExt Library as LSHUF.

I do not see any shuffling from Dieter description.
That seems more like "rinse, and repeat", throwing away (rare) repeated samples.

My Python randindex(...) look more like shuffling, except working thru the samples list.
 « Next Oldest | Next Newest »

User(s) browsing this thread: 1 Guest(s)