Well, I had enough time available on Saturday to get multiple variants into a working and somewhat optimized state before noon, with finishing touches in the evening. But that may be because I'm not an on-site parrticipant, so I don't have HHC panels demanding my attention...
Anyway, the scoring criteria (particularly the time elapsed on an unspecified "sample test problem") mean that I don't know what to optimize for. In UserRPL I have a pretty slow one that gets below the threshold where only the size matters, and a much faster one that comes in at
exactly 100 bytes (that's enough for the timing branch of the scoring procedure, right? Because it says "if less than 100 bytes", not "if less than or equal to 100 bytes"), meaning that if the sample input is small, it can take advantage of a sub-second time to get a good score. In SysRPL I ended up with just one real variant (which predictably wipes the floor with the UserRPL version) because the fast one is so small already. For the same scoring reasons as in UserRPL, I simply padded it to just barely break the threshold, though if the sample input is large enough to make the calculation take more than about 0.7_s, that may be a liability.
The general approach is the same across all variants: generate a long enough portion of the Golomb sequence, cut it to the requested range, then square the elements and sum them up. I can't avoid generating the starting portion of the sequence (from 1. to L) even if the inputs make me drop that part afterwards, because the definition makes the later elements depend on the earlier ones.
Note that for valid inputs (1. <= L < R < 1.E4) the sum can only reach about 5.7E8 (568210755. to be exact), so the requirement to calculate in mod 1.E9 can be safely ignored, just add the numbers up normally. Nice bit of misdirection there.
It's obvious (to me, anyway) that the Golomb sequence consists of groups of multiple consecutive same-magnitude numbers, so I went for an algorithm that generates a whole group in on go. As the definition makes clear, the length of the group can be looked up in the already-generated part of the sequence; the NDUPN command takes it from there. Compared to the straightforward method of generating one element at a time, this is much quicker, and it also saves some commands for figuring out if an element needs to be the same as the previous one or 1 greater (i.e. part of the same group or not). There is a possibility that this approach slightly overruns the upper limit requested via the R input, but cutting the sequence to the appropriate range with the right command gives me safety on that end for free.
In UserRPL there are two choices: generate the sequence in list form (smaller, by avoiding some of the other choice's stackrobatics), or generate it as what SysRPL programmers call a "meta", i.e. an exploded list (significantly faster by avoiding repeatedly copying already-generated elements to a new list). Cutting the sequence is best done with SUB which needs a list instead of a meta, but after this forced conversion to a list I once more have the choice between the short snippet making use of automatic list processing, or the fast snippet which explodes the list again and processes the elements one by one with an explicit loop. The latter is faster by avoiding creation of another intermediate list.
The code:
- short (85.5 bytes, #27CFh), \GS is the trigraph for uppercase sigma:
Code:
\<<
{ 1. 2. 2. } @ initial piece of the sequence, necessary to for prior-element lookups while generating more elements
4. PICK3 START @ I let the loop counter run 1 ahead of the length of the sequence so it stops if the length reaches the R input exactly
DUPDUP SIZE GET @ last list element, please
1. +
DUP2 GET @ look up how many times the new element repeats
UNROT PICK3 @ save a copy of the count away above the sequence
NDUPN \->LIST + @ append the correct number of new elements to the sequence, newRPL would need ADD in place of + here
SWAP STEP @ advance the loop counter by the number of new elements
UNROT SUB @ cut away the parts outside the requested range
SQ \GSLIST @ square and sum up
\>>
- fast (100 bytes, #E141h):
Code:
\<<
1. 2. 2. @ initial piece of the sequence, necessary to for prior-element lookups while generating more elements
3. @ length of the exploded list
4. 6. PICK START @ I let the loop counter run 1 ahead of the length of the sequence so it stops if the length reaches the R input exactly
OVER @ last list element, please
1. +
DUP2 - 3. + PICK @ look up how many times the new element repeats
NDUPN @ generate the correct number of new elements for the sequence
DUPDUP 3. + ROLL + @ retrieve the old sequence's length and add it onto a copy of the length of the new group
SWAP STEP @ advance the loop counter by the number of new elements
\->LIST UNROT SUB LIST\-> @ cut away the parts outside the requested range
0. @ initial value for the sum
1. ROT START @ for each element in the range ...
SWAP SQ + @ ... pull up the element, square it, and add it onto the sum
NEXT
\>>
In SysRPL there are few commands that support automatic list processing (basically only the UserRPL commands, because UserRPL is strictly a subset of SysRPL), and we have better support for handling metas instead. That means lists are doomed, metas are the way to go. Covering the other branch of the scoring criteria is achieved by adding one line that pushes a string and immediately drops it; its only purpose is to make the program big enough.
There is another quirk of this implementation, connected to the initial piece of the sequence. In the UserRPL programs I supplied the group of 2s so I don't get what would amount to an out-of-bounds read when looking up that I need two 2s while generating that group. In SysRPL I supply only the 1 at the start of the sequence, and let the out-of-bounds read hit the meta length. That should rightfully be 1 at the relevant moment, leading to wrong results, but I simply let the length be too great by 1 and correct it as needed. The combo commands of SysRPL allow this be a net win, while it would be a net loss in the meta-based UserRPL version and a guaranteed "Bad Argument Value" error in the list-based one.
I left out parameter count / type / value checking; make sure you supply two reals that satisfy the conditions for valid L and R, or this program will go ballistic. I could use up some of the padding space for it, but the padded version is where speed matters, and type checking definitely takes more CPU cycles than pushing and dropping a string.
Short version: 70 bytes, #1E4Fh; padded version: 100.5 bytes, #DE07h:
Code:
::
"PADDING FOR SCORE . . ." DROP (leave this line out for the short version)
COERCE2
#TWO#ONE BINT2
4PICK ONE_DO
OVER #1+
2DUP #- #2+PICK
NDUPN
DUP#1+ top& #1-SWAP
+LOOP
#1- SWAPUnDROP SubMetaOb
%0 SWAP
ZERO_DO
SWAP UNCOERCE %SQ_ %+
LOOP
;
This code is able to calculate the sum of squares over the sequence from 1. up to 1.E4 1. - in just over 23.7_s on real hardware. Shorter problems take less, obviously, 1. to 325. is just under the 0.7_s threshold where the two SysRPL versions get equal scores.
The UserRPL versions already tie at about 1. to 130. Out of curiosity I fed the full range into the faster of them just now, and it takes tens of minutes at least; not sure if I'm going to let it finish. In theory it should do so eventually, because it works fine on smaller requests, and I'm not aware of any hard limits it could bump into.
Also, dear curious judge, G_100 is 21. and G_500 is 56. Hacking the program to spit out specific values of the sequence instead of summing up from L to R took much longer than running it, even in the short and slow UserRPL variant, because I can't make these modifications in 3.2742_s. You may have underestimated the challenge there.