Threaded Mode | Linear Mode

DavidM · 03-19-2018, 01:24 AM

(03-18-2018 09:47 PM)pier4r Wrote: Hmm. If you are referring to this https://groups.google.com/d/topic/comp.s...discussion discussion, I read it.

I was actually referring to this thread. But I still think RAND is OK for most purposes that people would want to use it for on a calculator. Not up to cryptographic standards, but OK for most other purposes.

The issue I'm currently facing with LSHUF is how to effectively convert the RAND seed into an appropriate integer in the range needed for LSHUF to use for a swap target. In the past, I used the <whole seed> MOD <value>, where <value> is one greater than the largest swap target range I needed. That has now been shown to be a faulty method (because it only uses the least significant digits of the seed, which is clearly ineffective). I've proposed a potential fix (XORing the upper part of the seed onto the lower part before performing the MOD), but I don't have any way to know if that method is actually any better. While it seems like it should be, guessing is not a viable option here. Unless I can somehow confirm that this method works better, there's no reason to use it. And right now I know of no realistic way to test it.

A total rewrite is a possibility, but it just doesn't seem to me that this command is worth this much effort. I'm more inclined to let people simply choose their own methods for randomizing a list, thereby taking the responsibility for managing the level of effectiveness they deem appropriate.

pier4r · (This post was last modified: 03-19-2018 07:54 AM by pier4r.)

Quote:I was actually referring to this thread. But I still think RAND is OK for most purposes that people would want to use it for on a calculator. Not up to cryptographic standards, but OK for most other purposes.

Ok I forgot that thread, I even participated there. Thanks for the link.

What I get from that thread:
- suggestion about other random number generators (to use unless RAND is perceived as not good enough)
- a nice observation of ttw, that again it applies if one wants a extra good PRGN.
- Don't use the mantissa method of the two methods mentioned in the first post.
- Paul that says that RAND is somewhat biased. Taking it as truth, I still miss "how much" is biased, and if this bias is something that one should be concerned about for such applications (LSHUF doesn't have to encrypt any message). Surely RAND is not the best thing available, therefore it has bias. Look it is not even a real RNG, it is just a PRGN. But how much is it "bad"? Is it worse than the middle square method? Unlikely, and even that method is good enough for many applications. In other words for me "bad/biased" depends on the application.
- Interestingly, at least for what I understood, no one detected (or quantified) the "problem to be" in the sentence The converted RNSEED is then divided by the BINT passed as a parameter; the remainder of this division becomes the zero-based form of the result. as instead it is mentioned in https://groups.google.com/d/topic/comp.s...discussion (where JH Meyers mentioned the short cycle of the least significant bits)

Quote:The issue I'm currently facing with LSHUF is how to effectively convert the RAND seed into an appropriate integer in the range needed for LSHUF to use for a swap target. In the past, I used the <whole seed> MOD <value>, where <value> is one greater than the largest swap target range I needed. That has now been shown to be a faulty method (because it only uses the least significant digits of the seed, which is clearly ineffective). I've proposed a potential fix (XORing the upper part of the seed onto the lower part before performing the MOD), but I don't have any way to know if that method is actually any better. While it seems like it should be, guessing is not a viable option here. Unless I can somehow confirm that this method works better.

Some questions.

Do you want to confirm that the method based on XOR is better than the method based on MOD, right? As far as I understood it doesn't mean that the method based on XOR should be as good as the RAND or other methods, it should be only better than the method based on MOD. Is that correct? If so, wouldn't be possible to set a starting test - not the most rigorous test of the world, we are still talking about shuffling lists - that capture your idea of "better" ?

I'd like to point out that existing metrics are not "truth" spoken by God, they just capture, with proper data in a proper context, certain ideas or definitions or properties. So as far as the wanted property is somehow measured in a test, that's it. As I wrote, for me a starting test would be: I produce shuffles with RAND, I produce shuffles with the new routine. The I can: (a) measure how many times with RAND an element ended up in a certain position (so doing the distribution by position), and the same with the new routine; (b) I can compare how they match a given goal list (like my test does) and how is the statistics of the attempts; (c) One can even measure how much the elements were moved from the original position with RAND and the new routine; and some other metrics that makes sense if needed. I would start with (a) and (b). Then again I would do (a) and (b) also for the previous method, based on MOD. Then I would pick the method closer to the RAND results.

Moreover, as one in RPL would use listSize RAND * 1 + IP (or using CEIL), in effect picking the leftmost digits first. Couldn't you do the same somehow? I read in the other thread that you want to be fast, but if "fast" means "I don't like the command", wouldn't it be better to be slower retaining the command? At least for me it would be.

I am not sure how the RANDSEED is composed, yes JH Meyers explain it but I do not remember precisely now. I know it is an integer. Couldn't you get the upper part of this integer, say the first 5-6 digits and apply a mod to it? This should pick a larger period, closer to the userRPL statement above.

Anyway, if you feel uncomfortable with it, all this is not worth the stress (unless you want to discover things). Surely there is in this thread or the processing list thread (or somewhere else) a usable userRPL routine to shuffle lists to put in the uRPL library. In the worst case I use mine that was not that bad.

pier4r · 03-25-2018, 03:58 PM

Question to David.

By chance, could be that you had a look on the internal structure of the RPL command RANM ? Could be that it helps?

DavidM · 03-25-2018, 05:52 PM

(03-25-2018 03:58 PM)pier4r Wrote: Question to David.

By chance, could be that you had a look on the internal structure of the RPL command RANM ? Could be that it helps?

I'll take a look at it. If nothing else, it would be interesting to see what method it uses to come up with its integers.

A quick check shows that it is definitely cycling the internal RAND seed, perhaps twice per element. I'll check into it further.

John Keith · (This post was last modified: 03-25-2018 07:03 PM by John Keith.)

(03-25-2018 05:52 PM)DavidM Wrote: A quick check shows that it is definitely cycling the internal RAND seed, perhaps twice per element.

The documentation says that the numbers 1..9 occur with equal probability, but the number 0 occurs twice as often. This implies that the command first generates a random integer in the range 0..9, then negates it or not depending on the value of the second RAND.

Slightly OT, the RANM command in the 49/50 always returns a symbolic (type 29) array, even when the calc is in approximate mode.

John

DavidM · 03-27-2018, 10:07 PM

(03-25-2018 05:52 PM)DavidM Wrote: A quick check shows that it is definitely cycling the internal RAND seed, perhaps twice per element. I'll check into it further.

Finally got around to this...

Here's the pertinent subroutine that is called to generate each random entry:

Code:

(FPTR 3 54=3:6AA24)

::

  %RAN

  %10

  %*

  %FLOOR

  %RAN

  %.5

  %>=

  ?SKIP

  %CHS

;

...so it's essentially the same method used by 99% of random generators out there. Smile

As John mentioned, the second %RAN call is for the sign of the result.

No surprises there.

FWIW, I'm now experimenting with an alternative approach to selecting the swap target for the Fisher-Yates shuffle in LSHUF:

Initialize a "local" seed to the hex form of the system's RAND seed
Each time the Saturn loop needs a new random target:
- The local seed is run through a SplitMix64 cycle
- The above result is shifted 12 bits to the right
- MOD <max+1> is performed to reduce the value to the needed range

Performance has been good with this new method, and may even be faster than the previous one since there are fewer conversions taking place at each iteration (all the local data stays in binary form).

Would still love to hear thoughts on reasonable ways to test the results...

ttw · 03-28-2018, 10:01 AM

Multiplying by 20 and subtracting 10 seems a bit better. It saves one RNG call and doesn't push things into 0. Saving a call is useful as for many RNG types, using two RNG calls at a time shortens the auto-correlation from being of the cycle length of the RNG to being the Sqrt(cycle-length).

DavidM · 03-28-2018, 01:43 PM

(03-28-2018 10:01 AM)ttw Wrote: Multiplying by 20 and subtracting 10 seems a bit better. It saves one RNG call and doesn't push things into 0. Saving a call is useful as for many RNG types, using two RNG calls at a time shortens the auto-correlation from being of the cycle length of the RNG to being the Sqrt(cycle-length).

I like your method better, but I'd be less concerned about the cycle-length in this case than the issue of 0 having twice the representation of all of the other possible digits. I'm not familiar with why the command was created in the first place, though. Is there some mathematical reason that someone needing to generate a random matrix of single-digit integers needs more 0s than other digits? RANM has been around at least since the 48GX, but I have no idea what its origins are.

ttw · 03-28-2018, 04:56 PM

This is a common error that I have seen in other contexts. One first needs numbers from 0-9 so: multiply by 10 and take integer part. Then one wants a symmetrical version so: generate random sign and attach.

There is always a problem going from continuous to discrete. Similarly, even with good rounding, one cannot round a list of numbers twice. This has been known for years. Tables of trig functions computed to 7 digits cannot be consistently rounded to 5 digits.

pier4r · 03-28-2018, 07:53 PM

Idea for the library.

if I do
{ 1 { 1 2 3} } {1} +
I get
{ 1 { 1 2 3} 1 }

To get the wanted
{ 1 { 1 2 3} { 1 } }

either I explode the first list and then I repack it like
{ 1 { 1 2 3} }
OBJ\->
{1}
SWAP 1 +
\->LIST

Or I use LINSR. No even LINSR doesn't do it.
LPUSH does it.

In other words, I miss a command (or maybe I am not recalling it) that packs a list within a list without exploding it. Sure, one can always build a userRPL alternative, but I think it would fit well to have an equivalent of LPUSH for the tail of the list. As the standard "+" doesn't always work well with lists.

John Keith · 03-28-2018, 08:12 PM

(03-28-2018 01:43 PM)DavidM Wrote:
(03-28-2018 10:01 AM)ttw Wrote: Multiplying by 20 and subtracting 10 seems a bit better. It saves one RNG call and doesn't push things into 0. Saving a call is useful as for many RNG types, using two RNG calls at a time shortens the auto-correlation from being of the cycle length of the RNG to being the Sqrt(cycle-length).

I like your method better, but I'd be less concerned about the cycle-length in this case than the issue of 0 having twice the representation of all of the other possible digits. I'm not familiar with why the command was created in the first place, though. Is there some mathematical reason that someone needing to generate a random matrix of single-digit integers needs more 0s than other digits? RANM has been around at least since the 48GX, but I have no idea what its origins are.

I would bet that there is a valid reason, although exactly what that reason is may be above my pay grade. Zeros do seem to predominate in "real-world" matrices, and I doubt that HP's programmers would make such a simple mistake.

This is just an intuitive guess, I have no evidence that this is actually the case. Also, the method suggested by ttw would be preferred for uses other than generating random matrix elements.

John

DavidM · 03-28-2018, 08:24 PM

(03-28-2018 07:53 PM)pier4r Wrote: Idea for the library.
...

I recall thinking that a companion to LPUSH for the right side of the list would be sensible, then I thought "just use +". I neglected to consider the specific case you bring up, despite the fact that I have run into that myself on numerous occasions.

LPUSH is one of the smallest commands, and a companion LPSHR (LPUSHR) would also be very small and efficient (a single >TCOMP SysRPL command after the argument check should do the job). It would add whatever the SL1 object is to the list in SL2, without treating it differently if that object happens to be a list. It would also balance out the command set for LPOP/LPUSH/LPOPR. Smile

Seems like a good idea to me! I've added it to my TODO list.

ttw · 03-30-2018, 03:41 AM

(03-28-2018 08:12 PM)John Keith Wrote:
(03-28-2018 01:43 PM)DavidM Wrote: I like your method better, but I'd be less concerned about the cycle-length in this case than the issue of 0 having twice the representation of all of the other possible digits. I'm not familiar with why the command was created in the first place, though. Is there some mathematical reason that someone needing to generate a random matrix of single-digit integers needs more 0s than other digits? RANM has been around at least since the 48GX, but I have no idea what its origins are.

I would bet that there is a valid reason, although exactly what that reason is may be above my pay grade. Zeros do seem to predominate in "real-world" matrices, and I doubt that HP's programmers would make such a simple mistake.

This is just an intuitive guess, I have no evidence that this is actually the case. Also, the method suggested by ttw would be preferred for uses other than generating random matrix elements.

John

There are (as far as I know) no distributions with exactly double frequency between 0 and 1 from that everywhere else. Zero is no singled out at double the frequency. Normally, one might have a decrease in frequency from 0 to whatever. However, I have seen the problem I described create this distribution. I've done it (but caught the problem when reading the first test case) and I've seen other people do this. There's always a problem mapping one set of intervals into another; one has to be rather careful. One (not so obvious, it seems) problem is that a uniform distribution from 0 to 10 actually covers 11 different intervals. (Or 10 or 9 depending on the possibility of 0 or 10.)

pier4r · (This post was last modified: 03-31-2018 12:18 AM by pier4r.)

Thinking about a comment (in this thread or "processing lists") of John K. He remembered me that doing a STO+ on a list is quite costly if the list is large (as it is in most programming languages is costly to expand an array of values).

Would the cost be lower if instead of a STO+ I go with a PUT (that overwrites the element) ?
So I can preallocate a list and then go with replacements. In the case expanding the list to a new fixed size if needed.

Thinking about it. Is there a command in ListExt about multiple PUT at once (not insertions, rather replacements or additions if the index is larger than the list size)?

Also what about expanding the list adding lists? Like I collect the new elements in small temporary lists, say, 20 elements. And then I add those new elements to the result list.
Is the addition of 20 elements at once better than 20 times STO+ ?
I would say yes, but to be sure I ask.

DavidM · 03-31-2018, 01:16 AM

(03-31-2018 12:17 AM)pier4r Wrote: Thinking about a comment (in this thread or "processing lists") of John K. He remembered me that doing a STO+ on a list is quite costly if the list is large (as it is in most programming languages is costly to expand an array of values).

Would the cost be lower if instead of a STO+ I go with a PUT (that overwrites the element) ?...

Just about any alteration of an existing list (by built-in commands or otherwise) requires that the entire list be copied into a new one while making whatever alterations are required. This happens even if you do a PUT, because the list structure contains the actual objects instead of pointers to them (unless the objects are in ROM, but even that is not always guaranteed). There's also another reason for this which is often not considered: LAST/LASTARG/UNDO. A copy of the original argument has to be kept around, so any alteration has to be done on a new version of the original.

I believe we had some discussion before when LINSR was being formalized about the concept of inserting multiple objects into disjointed locations in a list. I'm personally not an advocate of a command that would attempt to do that, mostly due to the potential ambiguity of the indices in that scenario. That is less of an issue with a multiple PUT scenario, but then that is fairly easy to code as a DOLIST type of construct. Yes, the list would get rebuilt with each PUT step, but it is at least easy to code. Smile

It seems to me that you're really talking about some concepts that are better served in a different way, though. You'd be better off if you simply load the items onto the stack (if they were already in a list, just use LIST→), then make whatever changes are needed to the stack items with ROLL/ROLLD/DROP/etc. When done, rebuild the list with →LIST. You'll find that this is much more efficient with memory, and probably speedier as well. The objects in question don't actually move in memory until the list is rebuilt. The stack is nothing more than a list of pointers to objects that are scattered in memory, and those pointers can be manipulated quickly and efficiently. Changing the placement/ordering of stack items doesn't move the objects themselves, just their stack pointers.

And, yes, adding a list of 20 items to another list is much faster (and much, much more efficient) than 20 consecutive STO+ steps.

pier4r · 03-31-2018, 09:47 AM

(03-31-2018 01:16 AM)DavidM Wrote: It seems to me that you're really talking about some concepts that are better served in a different way, though. You'd be better off if you simply load the items onto the stack (if they were already in a list, just use LIST→), then make whatever changes are needed to the stack items with ROLL/ROLLD/DROP/etc. When done, rebuild the list with →LIST. You'll find that this is much more efficient with memory, and probably speedier as well. The objects in question don't actually move in memory until the list is rebuilt. The stack is nothing more than a list of pointers to objects that are scattered in memory, and those pointers can be manipulated quickly and efficiently. Changing the placement/ordering of stack items doesn't move the objects themselves, just their stack pointers.

And, yes, adding a list of 20 items to another list is much faster (and much, much more efficient) than 20 consecutive STO+ steps.

Thanks for the info. For the stackrobatics. Yes I finally yielded that doing stack operations and then \->LIST (or even just stack things) is faster. But not if you have operations in the middle - that may change with the evolution of a routine - and one has to keep track of what is going on to avoid mixing objects. I found that not so maintainable.

Keeping juggling 2-3 elements on the stack with different operations is still ok, more or less. Mixing operations, for me, it is not. Surely it can be done. Of course if you explode a list, roll a couple of elements, and pack it again, the elements can be many. But that is a well "self contained" operation without other operations in the middle.
Interesting that the 4 registers stack of previous RPN models is enough for my taste of "using the stack".

I will use the STO+ between temp lists. That will speed up things when I build large lists. It is a useful info.

In general, as far as I understood you, every command that modifies a list explodes it. So If I pack a bigger input it is better as there are less "explosions".

DavidM · 03-31-2018, 01:40 PM

(03-31-2018 09:47 AM)pier4r Wrote: In general, as far as I understood you, every command that modifies a list explodes it.

...or at least traverses and copies each individual list element to a new object. To me, "explosion" implies breaking the list into discrete objects and leaving them separated until they are "imploded" at some later time.

Several of the ListExt commands achieve a performance gain by making sure that the explode/manipulate/implode sequence occurs only once during their execution.

I thought your question was more about methods of improving efficiencies, so I responded along those lines.

(03-31-2018 09:47 AM)pier4r Wrote: So If I pack a bigger input it is better as there are less "explosions".

It really depends on what you are doing. I wouldn't generalize it in that way.

Your example of adding 20 items to a list with looped STO+ vs. "{list1} {list2} +" would support that concept, but there are plenty of others that don't. Especially if you are manipulating scattered pieces of a large list, the "keep it as a large list" model starts to break down. If the parts that need to be changed are contiguous, staying in list form probably makes sense. If they aren't, then stitching everything together as sublists could become costly in performance and complexity. The less contiguous your manipulations are, the more likely you'd be better off performing them with the explode/manipulate/implode approach.

The stack is an "invisible component" in the source code of RPN and RPL programs. I sometimes include detailed stack diagrams as comments in critical sections of code just so I can come back to it later with any hope of successful editing. Balancing the use of efficient stack operations with highly readable/maintainable (but less efficient) code is part of the art of programming these calculators, IMHO. It's probably also worth noting that the more we use certain constructs, the quicker we recognize them when editing. So the level of readability of a given program can change over time as we add new techniques to our own personal knowledge base.

**rprosperi** · 03-31-2018, 03:56 PM

(03-31-2018 09:47 AM)pier4r Wrote: For the stackrobatics....

Good word! Very accurate and descriptive, I like it!

pier4r · 03-31-2018, 06:30 PM

I read stackrobatics somewhere on mohpc.

And I agree there the more one uses not intuitive concepts , the more one can decode them quickly.
The problem arise when one comes back after a pause from using the calculators. Then, then readability matters. At least for me.

pier4r · 04-02-2018, 08:51 AM

Now I am not sure I proposed the following commands already, anyway better propose something twice than none sometimes.

Finding the min and maximum of a list. LMAX and LMIN. It would suprise me if no one asked for this before.

Where LMAX is the equivalent of:
inputList LSORT 1 LLAST @LSORT from werner

and LMIN is
inputList LSORT HEAD @LSORT from werner

Aside from 1 command to type instead of 3 (my motto), although those above will go in the collection of user RPL snippets (if you have better ideas, please share); I strongly believe that David can save the sort complexity and also LFRST/LLAST/HEAD operations.

As to find the minimum/maximum one has to go through the list keeping a reference.

Additionally one can also do LMIN / LMAX returning the value and all the positions that have that value.

Therefore
inputList DUP LSORT 1 LLAST
SWAP OVER MPOS

and
inputList LSORT HEAD
SWAP OVER MPOS