Threaded Mode | Linear Mode

nsg · 04-27-2017, 04:07 AM

This program generates random sample of k elements out of total set of N elements. Element can be used only once. All such subsets are drawn with equal probabilitites (assuming RNG is fair).

Algorithm considers each integer i from 0 to N-1 and decides on each whether to include it in a sample or not with probability (k-done)/(N-i) where done is the number of already selected items.
The generated elements are not stored, so k and N can be any positive integers, however this algorithm is only practical with relatively small N.

Instruction:
<N> sto 0
<k> sto 1
<0.random_seed> sto 5
f prgm r/s
Output: first element of a sample 0..N-1
r/s
Output: next element of a sample.
When no more elements outputs -1
r/s after that starts a new sample.

Code:

 0

 sto 2

 sto 3

 rcl 1

 rcl 3

 -

 x=0

 gto 30

 rcl 0

 rcl 2

 -

 /

 rcl 5

 1

 1

 *

 pi

 +

 frac

 sto 5

 x>=y

 gto 27

 1

 sto+3

 rcl 2

 stop

 1

 sto+2

 gto 04

 1

 chs

 gto 00

Example:
100 sto 0
10 sto 1
0.1234 sto 5
f prgm r/s
[2] r/s [4] r/s [23] r/s [30] r/s [50] r/s
[53] r/s [57] r/s [59] r/s [66] r/s [96] r/s
[-1] -- indicates end of sample

I wonder if it is possible to improve on the algorithm and generate a sample of modest k for arbitrary N, for example, a sample of 30 out of 10000000.

Thomas Klemm · 10-27-2018, 03:52 PM

Here's a translation of your program for the HP-42S:

Code:

00 { 41-Byte Prgm }

01▸LBL "SAMPLE"

02 RCL 00

03 STO 02

04 RCL 01

05 STO 03

06▸LBL 00

07 RCL 03

08 RCL 02

09 ÷

10 RAN

11 X≥Y?

12 GTO 01

13 RCL 02

14 STOP

15 DSE 03

16 GTO 01

17 GTO 02

18▸LBL 01

19 DSE 02

20 GTO 00

21▸LBL 02

22 -1

23 END

However, I reversed the order so I could use DSE.
Thus the generated samples appear in descending order.

Example:

100
STO 00
10
STO 01
XEQ "SAMPLE"
85
R/S
64
R/S
62
R/S
59
R/S
55
R/S
53
R/S
39
R/S
32
R/S
30
R/S
11
R/S
-1

(04-27-2017 04:07 AM)nsg Wrote: I wonder if it is possible to improve on the algorithm and generate a sample of modest k for arbitrary N, for example, a sample of 30 out of 10000000.

In this case we might simply generate them at random and hope there's no collision.

Kind regards
Thomas

Dieter · (This post was last modified: 10-28-2018 09:20 AM by Dieter.)

(10-27-2018 03:52 PM)Thomas Klemm Wrote: Here's a translation of your program for the HP-42S:

Code:

... 15 DSE 03 16 GTO 01 17 GTO 02 18▸LBL 01 ...

One more case where something like INV DSZ would be useful (greetings to the TI community ;-)). This can be accomplished quite easily by having the DSE followed by a test that always tests false. Since X holds a positive number at this point, a simple x<0? will do.

Code:

...

15 DSE 03

16 X<0?

17 GTO 02

...

Slightly shorter, slightly faster, saves one label.
Edit: OK, in this special case it's neither shorter nor label-saving, simply because LBL 01 is required anyway. #-)

(10-27-2018 03:52 PM)Thomas Klemm Wrote:
(04-27-2017 04:07 AM)nsg Wrote: I wonder if it is possible to improve on the algorithm and generate a sample of modest k for arbitrary N, for example, a sample of 30 out of 10000000.
In this case we might simply generate them at random and hope there's no collision.

In such a case there is no significant difference between sampling with or without repetition. So it doesn't matter much if a number is drawn twice. Otherwise the idea is: record the sample in a (sorted) list, generate the next number, see if it's part of the list. If yes: get a new number, if not: insert it into the list. Hmmm... somehow this sounds like a job for the HP50. ;-)

Dieter

Albert Chan · (This post was last modified: 10-28-2018 04:52 PM by Albert Chan.)

(04-27-2017 04:07 AM)nsg Wrote: I wonder if it is possible to improve on the algorithm and generate a sample of modest k for arbitrary N,
for example, a sample of 30 out of 10000000.

Code:

import random

def randindex(k, n):

    s = [n] * k             # k samples

    for i in range(k):

        r = random.randrange(n-i)

        j = 0

        while r + j >= s[j]: j = j+1

        s[j+1:i+1] = s[j:i] # make room for s[j]

        s[j] = r + j        # sorted insert

    return s

Above code only need k random numbers, and return sorted samples.

>>> randindex(10,10) # no repeat
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> random.seed(1)
>>> randindex(30,10000000)
[21060, 254459, 283474, 290410, 305901, 938595 ...

Thomas Klemm · 10-28-2018, 03:42 PM

(10-28-2018 02:18 PM)Albert Chan Wrote: Above code only need k random numbers, and return sorted samples.

Writing a program for the HP-25 using only 8 registers and 49 programming steps is left as an exercise for the dear reader.

Cheers
Thomas

John Keith · (This post was last modified: 10-28-2018 08:04 PM by John Keith.)

(10-28-2018 09:18 AM)Dieter Wrote: In such a case there is no significant difference between sampling with or without repetition. So it doesn't matter much if a number is drawn twice. Otherwise the idea is: record the sample in a (sorted) list, generate the next number, see if it's part of the list. If yes: get a new number, if not: insert it into the list. Hmmm... somehow this sounds like a job for the HP50. ;-)

Going further of-topic...
This would be a variation of the Fisher-Yates shuffle, implemented in the ListExt Library as LSHUF.

Albert Chan · (This post was last modified: 10-28-2018 08:52 PM by Albert Chan.)

(10-28-2018 08:03 PM)John Keith Wrote:
(10-28-2018 09:18 AM)Dieter Wrote: In such a case there is no significant difference between sampling with or without repetition. So it doesn't matter much if a number is drawn twice. Otherwise the idea is: record the sample in a (sorted) list, generate the next number, see if it's part of the list. If yes: get a new number, if not: insert it into the list. Hmmm... somehow this sounds like a job for the HP50. ;-)

This would be a variation of the Fisher-Yates shuffle, implemented in the ListExt Library as LSHUF.

I do not see any shuffling from Dieter description.
That seems more like "rinse, and repeat", throwing away (rare) repeated samples.

My Python randindex(...) look more like shuffling, except working thru the samples list.