Post Reply 
Assembler, but not at all
09-21-2023, 09:26 AM
Post: #1
Assembler, but not at all
I know that the decision has already been made that there won't be any possibility to run machine code (assembler) on the HP Prime. I understand the risks associated with this, and it's clear to me that opening low-level access is not advisable in this case.

However, what if we could find a compromise?
All of this is about performance, and performance is needed in algorithms. Algorithms mainly operate on lists (or possibly arrays, which can be mapped to lists anyway). We don't want assembler access to dig into the operating system's memory or manipulate forbidden objects, but rather to use it for fast calculations. Access to a limited memory area (reserved for this purpose) would be sufficient. This memory area could be dedicated to an object, let's call it NativeList, and possibly to text string (which already exists in PPL).

NativeList would be low-level, containing only 32-bit integers (as ARM standardly operates). The string, on the other hand, would remain as it is, essentially also a list but with 16-bit values.

In addition, a pseudo-assembler would need to be created, which would have a limited number of ARM assembler instructions (common for G1 and G3), but with the additional requirement that during code compilation, for all memory read/write operations, it would add a few extra verification commands to ensure that the read and write operations stay within the allowed memory area.

So, we could select a few ARM instructions that allow memory read and write operations and secure them, while the remaining ARM instructions like arithmetic, bitwise operations, etc., could be directly compiled into machine code as they don't introduce sabotage risks. Alternatively, we could consider the stack since it's theoretically also a read and write operation to memory.

However, it's crucial that all of this compiles into a native procedure in machine code and is possible to run for the specified object from the PPL level.

For instance, for a memory read operation in the list:
Code:
    ldr r1, [mem_addr]

After compilation, it would look something like this:
Code:
    ldr r1, =mem_addr
    ldr r2, =0x1000      @ load lower bound of the range (set by PPL environment)
    ldr r3, =0xFFFF      @ load upper bound of the range (set by PPL environment)
    cmp r1, r2           @ compare the address in r1 with the lower bound
    blo error_handler    @ if address < lower bound, go to the error handler
    cmp r1, r3           @ compare the address in r1 with the upper bound
    bhi error_handler    @ if address > upper bound, go to the error handler
    ldr r1, [r1]         @ you can continue reading data from memory here because the address is within the allowed range

And perhaps here, there should be some push and pop instructions to secure the modified registers.

Actually, this can be implemented as a subroutine in asm, and for each ldr call this subroutine and pass the arguments in it (an index instead of an address).
A similar protection can be done for memory writes, and it would actually create a sophisticated solution that solves all performance issues on the HP Prime, providing 95% native CPU performance in areas where performance is critical. Of course, the assumption is that the program is still written in PPL, but individual procedures are added in ARM and called, for example, with the EXEC command.

From the PPL level, a created NativeList or string should be stored in a variable and passed as an argument (3 examples):

Code:
#ASM armRoutine1
.
.
#END

EXPORT ARMTEST1()
BEGIN
  LOCAL nlst, result;
  nlst := MAKENATIVELIST(0,512);         // Create a native list containing 512 32-bit cells initialized with values of 0.
  result := EXEC(armRoutine1, nlst);     // Execute the procedure on this list and retrieve the modified list.
END;

Code:
#ASM armRoutine2
.
.
#END

EXPORT ARMTEST2()
BEGIN
  LOCAL lst, nlst, result;
  lst := MAKELIST(X, X, 1, 100);         // Create a PPL list and initialize it with values from 1 to 100.
  nlst := MAKENATIVELIST(lst);           // Create a native list initialized according to the provided PPL list (containing 100 32-bit cells with values from 1 to 100).
  result := EXEC(armRoutine2, nlst);     // Execute the procedure on this list and retrieve the modified list.
END;

Code:
#ASM armRoutine3
.
.
#END

EXPORT ARMTEST3()
BEGIN
  LOCAL str, result;
  str := "STRING TO MANIPULATE";
  result := EXEC(armRoutine3, str);      // Execute the procedure on this string and retrieve the modified string.
END;

If someone needs an array, it can be simulated using a list with the mla command (ARM), for calculation of coordinates and mapping to a specific memory area. Alternatively, you can also use a built-in subroutine for such purposes.

Writing the compiler itself is relatively straightforward because it involves mapping ARM instructions and their arguments to their binary form (which is readily available).

Of course, this is a loose proposal. It can be further refined, discussed, and potentially implemented differently, but my intention was to present the general principles of this concept, which could prove to be a game-changer. Such a solution would unlock the potential of the HP Prime and undoubtedly place it ahead of other calculators.

@Jeff,
What do you think about this?
If such a mechanism were to be created, we could independently start creating functions like the ones we discussed, such as a customized string REPLACE, tailored to specific needs.

Piotr
Find all posts by this user
Quote this message in a reply
09-21-2023, 09:47 PM (This post was last modified: 09-21-2023 09:50 PM by Jean-Baptiste Boric.)
Post: #2
RE: Assembler, but not at all
There are several issues with your proposal:
  • The HP Prime ecosystem isn't limited to the calculators, there's the virtual calculator too which runs on non-ARMv7 architectures. Your proposal is too specific to the HP Prime G1/G2 target.
  • It is possible that a hypothetical future HP Prime G3 model would not use the ARMv7 instruction set. In 6-10 years, that could be an ARM64 or RISC-V core. Your proposal isn't future-proof.
  • Writing homegrown, sandboxed JIT compilers is notoriously difficult, Linux's eBPF implementation has seen several CVE's over the years. Why would the HP Prime development team have a better track record?
Rather than reinventing the wheel, I would suggest two different options:
  • If it must fit inside of HP PPL, take inspiration from JavaScript's asm.js specification and define a set of types and operations that can leverage a fast-path inside of the HP PPL interpreter ;
  • Or better yet, just integrate an WebAssembly interpreter (probably Wasm3).
Honestly I would push for the second option because it shouldn't be a lot of work and it should be low-risk. The potential is almost endless:
  • You could invoke WebAssembly snippets for dealing with hot-paths of your existing HP PPL programs ;
  • You could run newer versions of giac on-calc without having to wait for an official firmware update ;
  • You could port and run almost anything easily thanks to existing WebAssembly toolchains ;
  • ...
Find all posts by this user
Quote this message in a reply
09-22-2023, 03:50 AM (This post was last modified: 09-22-2023 04:02 AM by komame.)
Post: #3
RE: Assembler, but not at all
During my studies, I wrote an assembler compiler. At that time, it was for x86 assembly, but the principle was similar. The result of the compilation was a ".com" file that could be run in a DOS environment (no, I'm not that old; we worked on Windows, but generating a ".com" file was much easier due to the absence of the header required in ".exe" files, and it was easy to test in the console). In all of this, the biggest challenge was lexical and syntactical analysis, while the compilation itself was a straightforward operation involving writing the binary equivalents of individual instructions to a file. Of course, back then, my compiler didn't support all x86 assembly instructions because that wasn't the goal of the task. In any case, the compiled code could be executed directly as an executable file, and that's exactly how I wanted to implement it here, but as a subroutine run from within PPL.

(09-21-2023 09:47 PM)Jean-Baptiste Boric Wrote:  
  • Writing homegrown, sandboxed JIT compilers is notoriously difficult, Linux's eBPF implementation has seen several CVE's over the years. Why would the HP Prime development team have a better track record?
I'm not sure if you understood me correctly, but I absolutely don't want to write a new JIT. What I mentioned is a solution that is multiple times simpler and easier to implement than any other compiler, and certainly easier than integrating with WebAssembly. I mentioned compiling instructions or pseudo-instructions (not necessarily named exactly like in ARM) into ARM machine code that runs directly as binary code, without a JIT layer. This is a huge difference because JIT implementation is a complex topic, while compiling to machine code executed directly by the hardware CPU is a process that can almost be compared to conversion. In the approach I proposed, compilation would happen before the program runs (when you use [CHECK] and tokenization into bytecode for the PPL interpreter occurs), and there would be no need to worry about memory management (there's no dynamic variable creation and memory allocation here), and the code execution would largely rely on using CPU registers and input data, and possibly the stack. Compilation would involve translating commands into their binary form, ready to run without an additional intermediate layer.

(09-21-2023 09:47 PM)Jean-Baptiste Boric Wrote:  
  • Or better yet, just integrate an WebAssembly interpreter (probably Wasm3).
I'm not saying that WebAssembly is a bad idea. It does indeed have its merits, but it's a monster compared to the solution I'm proposing. The implementation effort for this on the HP Prime would be significantly greater and more challenging. Moreover, due to the additional intermediate layer, on a device like the HP Prime with a CPU clocked at 0.5GHz it couldn't compete in terms of performance with machine code. Even code written in C++ often loses to code written directly in assembly language, let alone adding an extra layer of abstraction as in the case of WebAssembly.
Although the HP Prime has excellent hardware for a calculator, it falls far behind even the simplest smartphones available today, and I don't think it would efficiently handle such tasks. Regardless, the biggest issue remains that the amount of work required to implement this is really huge.

(09-21-2023 09:47 PM)Jean-Baptiste Boric Wrote:  
  • It is possible that a hypothetical future HP Prime G3 model would not use the ARMv7 instruction set. In 6-10 years, that could be an ARM64 or RISC-V core. Your proposal isn't future-proof.
If there is a future version of the Prime (G3) that operates on a different ARM architecture (incompatible with the current one), then, of course, there will be a new firmware version created, just as it happened when the G2 was introduced. I believe such a change would be a significant undertaking for HP, and it's unlikely they would make such a major change. However, even if it were to happen, the difference would be that during compilation, G3 would receive different binary equivalents for individual instructions (or perhaps small subroutines if more than one instruction is required to perform the same task), and the rest would proceed in the same way. The firmware would simply generate binary code for its own CPU, and it's not a significant difference.

The important thing in my approach is that I don't necessarily assume the implementation of 100% of ARM assembly instructions (I'm not insisting that they should be called the same, as for use in different architectures, they could have their own names independent of ARM). Selecting perhaps the most essential 50-60 would suffice, and many of them are simple operations like incrementation or bitwise shifting (which can be emulated very easily). Of course, among the selected ones, there would also be slightly more complex ones, but nevertheless, I don't think any single assembly instruction is particularly complicated because they are all fundamentally basic operations.

(09-21-2023 09:47 PM)Jean-Baptiste Boric Wrote:  
  • The HP Prime ecosystem isn't limited to the calculators, there's the virtual calculator too which runs on non-ARMv7 architectures. Your proposal is too specific to the HP Prime G1/G2 target.
Today we have three different types of firmware: one for G1, one for G2, and one for the virtual calculator, which is a completely separate project. I believe the approach I mentioned above could be similarly implemented on the side of the virtual calculator. It might require a bit more effort here, but not as much as it might seem. If the virtual calculator doesn't have built-in ARM code emulation, you would simply need to generate x86/x64 code. Emulating dozens of ARM instructions is not overly complex, and many of them could be mapped to x86/x64 with little effort.

In summary, I think the issues you mentioned wouldn't arise here. Furthermore, I consider the implementation of such a solution to be quite feasible, and that too with a relatively small amount of effort.

Best regards,
Piotr
Find all posts by this user
Quote this message in a reply
09-22-2023, 09:03 PM (This post was last modified: 09-22-2023 09:45 PM by jte.)
Post: #4
RE: Assembler, but not at all
Lots of good points raised by Jean-Baptiste and Piotr. Big Grin

(09-22-2023 03:50 AM)komame Wrote:  During my studies, I wrote an assembler compiler. At that time, it was for x86 assembly, but the principle was similar. The result of the compilation was a ".com" file that could be run in a DOS environment

Ah, the halcyon days of youth, when code flowed like water... For one university assignment, I was aiming to get a C compiler far enough along that it could compile itself (I wasn’t aiming to handle unions, bit-fields, and a few other odds and ends — just enough to compile the compiler). The target architecture was a “bytecode” for a “bytecode” interpreter (although, back then, it would’ve been termed a “p-code” interpreter) — but the interpreter I already had available for that p-code ran on 8-bit home computers, so I’m not sure how much compiling the compiled compiler could accomplish… (& the assignment deadline came before I could close the loop)

A while back I was pondering on the languages built into the HP Prime; I was thinking something Forth-ish might be nice, to carry on a bit of the stack tradition. But a very simple language close to the underlying cpu is certainly another possibility to consider.

At the moment, though, my plate is pretty full. And many of these sorts of decisions would involve internal discussions (and the reality of the commercial marketplace dictates certain priorities and possibilities — Python is certainly of interest in the educational community, and some improvements to Python support are in order). I’d like to improve the bug fixing process a bit, and fix some bugs (I’m wanting some more automation here; setting up automation takes time, but does streamline things once set up).
Find all posts by this user
Quote this message in a reply
09-22-2023, 09:19 PM (This post was last modified: 09-22-2023 10:31 PM by jte.)
Post: #5
RE: Assembler, but not at all
(09-21-2023 09:26 AM)komame Wrote:  
@Jeff,
What do you think about this?

It’s a nice proposal, and well-presented. I like the details, and the later clarifications. Big Grin One thing that pops into my head is that having convenient framebuffer access could be nice. But I was thinking of making some changes regarding how graphics are handled in a few places… (Currently, the Function Plot view, for example, does not use, in the main, offscreen buffers. I wrote this code for the HP 39gII where memory is extremely limited. With the much greater memory resources of the HP Prime, offscreen buffers are more appropriate. As a programmer, perhaps you can appreciate the care needed to do things like drawing and undrawing the tangent lines, marching ants, and signed area display in the Function app’s plot view — without using offscreen buffers… [and without redrawing the plotted functions — using just the pixels onscreen {the tangent line only know the value and slope for the function at where the tracing cursor is; the marching ants only know the pixel coordinates of the cursor; changing the signed area by one column involves computing the upper and lower bounds at one X value}].)
Find all posts by this user
Quote this message in a reply
09-25-2023, 05:24 PM
Post: #6
RE: Assembler, but not at all
(09-22-2023 09:03 PM)jte Wrote:  At the moment, though, my plate is pretty full. And many of these sorts of decisions would involve internal discussions (and the reality of the commercial marketplace dictates certain priorities and possibilities — Python is certainly of interest in the educational community, and some improvements to Python support are in order).

Do you have an internal list of issues related to the Python environment? Or perhaps such a list should be created? I'm asking because I've seen that the bug tracker doesn't cover this area.
Find all posts by this user
Quote this message in a reply
09-26-2023, 07:24 AM
Post: #7
RE: Assembler, but not at all
(09-25-2023 05:24 PM)komame Wrote:  
(09-22-2023 09:03 PM)jte Wrote:  At the moment, though, my plate is pretty full. And many of these sorts of decisions would involve internal discussions (and the reality of the commercial marketplace dictates certain priorities and possibilities — Python is certainly of interest in the educational community, and some improvements to Python support are in order).

Do you have an internal list of issues related to the Python environment? Or perhaps such a list should be created? I'm asking because I've seen that the bug tracker doesn't cover this area.

I do have a list of issues I've run into when using Python. I haven't yet gotten to adding those to the bug tracker.
Find all posts by this user
Quote this message in a reply
10-11-2023, 09:43 AM
Post: #8
RE: Assembler, but not at all
(09-22-2023 09:03 PM)jte Wrote:  A while back I was pondering on the languages built into the HP Prime; I was thinking something Forth-ish might be nice, to carry on a bit of the stack tradition. But a very simple language close to the underlying cpu is certainly another possibility to consider.

If FORTH were available on the HP Prime and it was compiled to machine code (instead of being interpreted), it would be a game changer. I think most people who love HP calculators for their RPN would be over the moon about such a feature. However, since Python is already there and it's the main focus for development, the chances for another language are slim, right?

(09-22-2023 09:19 PM)jte Wrote:  One thing that pops into my head is that having convenient framebuffer access could be nice. But I was thinking of making some changes regarding how graphics are handled in a few places… (Currently, the Function Plot view, for example, does not use, in the main, offscreen buffers. I wrote this code for the HP 39gII where memory is extremely limited. With the much greater memory resources of the HP Prime, offscreen buffers are more appropriate. As a programmer, perhaps you can appreciate the care needed to do things like drawing and undrawing the tangent lines, marching ants, and signed area display in the Function app’s plot view — without using offscreen buffers… [and without redrawing the plotted functions — using just the pixels onscreen {the tangent line only know the value and slope for the function at where the tracing cursor is; the marching ants only know the pixel coordinates of the cursor; changing the signed area by one column involves computing the upper and lower bounds at one X value}].)

That's the incredible thing about this kind of programming, where most conventional developers will say "it's impossible" or, even if they somehow manage to do it, it either works in a very limited way or runs painfully slow. Skillful use of the right algorithms can produce astonishing results even with incredibly limited resources.
I personally wrote a game for a PC (for DOS, about year 1997) once, where the executable file was just 1229 bytes (this already included graphical elements [perhaps simple, but still there]). The file with the data for individual levels was separate but also designed so that dozens of levels fit into just over 900 bytes. If anyone has seen Radioactive Wastes on HP Prime, that was the game (with identical rules carried over 100%), but the PC version had much more modest graphics since I took on the challenge to keep the file as small as possible.
However, an even bigger challenge was when, finishing technical school, my friend and I had to construct a laser controller (for light effects) from scratch. My friend, who loved soldering, mostly took care of the hardware, but left the entire firmware programming to me. The controller consisted of a 2x16 character LCD display, a few buttons, with the key component being two servomechanisms controlled usign "Pulse WIdth Modulation" (PWM), where each motor tilted at an angle proportional to the pulse fill value. At the end of the second servo, a laser was attached. To achieve this back in the day (with a very limited budget), I had at my disposal the AT89C4051 microcontroller, which only had 128 bytes of RAM (1/8 kb!!!) and 4kB of flash memory for the program. The program had to manage a menu, where among other things, one could calibrate the device, set the starting point (using arrow keys), program a set of shapes to draw (their sequence, time, etc.), and, of course, run the program. Sure, it was a simple device, but fitting all that into such limited RAM was a genuine feat. I was fighting for every single bit of memory. The entire program, from start to finish, was written in 8051 assembler. If I had written it in C or another high-level language, there would've been no chance to fit it.
Ultimately, the work was submitted (as a final project in the last grade), and it was a real hit with the instructors.

It's nice to reminisce about those times Smile
Find all posts by this user
Quote this message in a reply
10-11-2023, 02:20 PM
Post: #9
RE: Assembler, but not at all
(10-11-2023 09:43 AM)komame Wrote:  
(09-22-2023 09:03 PM)jte Wrote:  A while back I was pondering on the languages built into the HP Prime; I was thinking something Forth-ish might be nice, to carry on a bit of the stack tradition. But a very simple language close to the underlying cpu is certainly another possibility to consider.

If FORTH were available on the HP Prime and it was compiled to machine code (instead of being interpreted), it would be a game changer. I think most people who love HP calculators for their RPN would be over the moon about such a feature. However, since Python is already there and it's the main focus for development, the chances for another language are slim, right?

Being that the Prime is an HP graphing calculator, I would think that RPL is the natural choice if another language was available. Of course, RPL is slower than PPL, never mind a compiled language, so not really applicable to the task at hand.
Find all posts by this user
Quote this message in a reply
10-11-2023, 04:36 PM
Post: #10
RE: Assembler, but not at all
(10-11-2023 02:20 PM)John Keith Wrote:  Being that the Prime is an HP graphing calculator, I would think that RPL is the natural choice if another language was available. Of course, RPL is slower than PPL, never mind a compiled language, so not really applicable to the task at hand.

It's not that simple to unequivocally state that RPL is slower than PPL. For example, consider the HP49G, which had a Saturn processor clocked at 4MHz and additionally only a 4-bit data bus (meaning not only did it have just 4MHz, but in some cases, it required multiple cycles for a single operation). On the other hand, the HP Prime is clocked at 528 MHz (G2) and operates on 32 bits in a single cycle. The theoretical performance difference might be up to 1000 times ((528/4)*(32/4) = 1056). If we were to run PPL on the HP49G, it might not necessarily be faster than RPL at those 4 MHz... Looking from another perspective, I could ask whether RPL accelerated 1000 times would be faster than PPL? Wink
Find all posts by this user
Quote this message in a reply
10-11-2023, 10:41 PM
Post: #11
RE: Assembler, but not at all
You are partly right, much of the relative slowness of RPL is due to implementation details. Many unfortunate choices were made in the early development of the language due to the small amount of memory available on the older calculators. Later models use ARM processors to emulate the Saturn processor, but the emulation seems quite inefficient- the ARM processor in the 50g runs at 75 MHz but the 50 is not much faster than the 49.

NewRPL is a newer implementation written in C and compiled for the ARM. It is much faster than "old RPL", and probably faster than PPL, but at the user level it is still an interpreted language. A beta version of NewRPL can be used on the Prime G1 but not on the G2 due to software locks demanded by standardized testing authorities.

I would certainly like to see NewRPL be an "official" language on the Prime, but that seems unlikely. A compiled FORTH would probably be faster, but much more limited in terms of language features, data types, etc.
Find all posts by this user
Quote this message in a reply
10-12-2023, 12:08 AM
Post: #12
RE: Assembler, but not at all
(10-11-2023 02:20 PM)John Keith Wrote:  
Being that the Prime is an HP graphing calculator, I would think that RPL is the natural choice if another language was available. Of course, RPL is slower than PPL, never mind a compiled language, so not really applicable to the task at hand.

John, I'm sure you think right. I wrote Forth-ish just that I'm more familiar with Forth. (My second home computer being a Jupiter Ace [not counting my CARDIAC machines].)
Find all posts by this user
Quote this message in a reply
10-12-2023, 03:31 AM (This post was last modified: 10-12-2023 06:33 AM by komame.)
Post: #13
RE: Assembler, but not at all
(10-11-2023 10:41 PM)John Keith Wrote:  Later models use ARM processors to emulate the Saturn processor, but the emulation seems quite inefficient- the ARM processor in the 50g runs at 75 MHz but the 50 is not much faster than the 49.

For this reason, my choice for comparison was the 49G, because in the 50G the emulation layer is immeasurable.

(10-11-2023 10:41 PM)John Keith Wrote:  You are partly right, much of the relative slowness of RPL is due to implementation details. Many unfortunate choices were made in the early development of the language due to the small amount of memory available on the older calculators.

Unfortunately, with PPL, it doesn't seem to be much better.
When I discuss something, I like to support my statements with real results and measurements. Given that I no longer have the HP49G (I used it a few years ago), I'd like to refer to a benchmark that was published on the old forum by Gene Wright:
Calculator Benchmark
I know that such a benchmark doesn't measure everything, but it gives a preliminary picture. If something is drastically faster or slower, even such a general measurement will allow us to notice it. This benchmark increases the value by 1 in each loop cycle (counts cycles) for 60 seconds. If we search for the HP 49G here, we will see something like this:
Code:
HP 49G
Count: 12,351
Code: 1. << DO 1. + UNTIL 0. END >>

A value of 12351 for a minute of iterating seems very low (meaning it's very slow), right? This means that RPL needs as many as 19431 CPU cycles for 1 cycle of such a loop (here the CPU speed doesn't matter anymore, as we measure the number of CPU cycles per iteration in a given programming language). We calculate it as follows: [Processor speed in Hz] * [measurement time in seconds] / [number of loop iterations during the measurement]. So in this case: 4000000 Hz * 60 s / 12351 iterations = 19431 CPU cycles.

The same program implemented in PPL would look like this:
Code:
EXPORT LOOPS()
BEGIN
  LOCAL i;
  REPEAT
    i:=i+1;
  UNTIL 0;
END;
and after 60 seconds, the result is 2,950,000 (firmware 14730, tested just after soft reset). So, for one loop cycle, PPL requires 10738 CPU cycles.

At this point, we could say that PPL is about 2x faster than RPL. However, note that we've only considered CPU clocking so far. If we include the RISC architecture used in ARM and the cache (L1) that ARM processors have, which Saturn did not have, plus the memory that is certainly faster in Prime than in HP 49G, then PPL doesn't seem so efficient. I'd even venture to say that considering all these factors, PPL would fare worse than RPL in this benchmark (if it were run on inferior hardware, like the HP49G, even with a large amount of RAM, etc.).

After all, one doesn't have to look far. Python is also an interpreted language, but when we run both PPL and Python on HP Prime, Python clearly wins. For the aforementioned benchmark, Python on HP Prime performs 54,000,000 iterations, which results in only 586 CPU cycles per one loop iteration, and with such a result, we can already speak of a good implementation of interpreted language.

In conclusion:
I completely agree that RPL is inefficient and was designed from the outset with certain hardware limitations in mind. However, despite all this, PPL doesn't seem to be any better, even though it wasn't subject to such limitations when it was designed.
PPL is considered fast just because it operates on an incredibly fast machine, and if we were to run RPL on such a machine, looking at the above results, I think it would perform comparably to PPL (or even better).

EDIT: I forgot that I still have the G1, which primarily differs from the G2 in its CPU architecture (ARMv5 vs ARMv7) and the actual processor clock speed is just a little over 20% difference. In this test, the G1 [400MHz] scores as follows: 845,000 iterations / 28402 CPU cycles per one loop iteration (so it is worse than RPL on HP49G), which unfortunately confirms that under such conditions, RPL would probably take the lead (but this doesn't reflect well on RPL, it just reflects poorly on PPL). Therefore, having any language on the HP Prime with the ability to compile to machine code, it might turn out that it operates even thousands of times faster than PPL (e.g. compiled Forth-ish Wink language, which could be used only in performance-critical spots, wouldn't need to be a sophisticated language; basic commands would suffice, while the rest of the program could operate in PPL or Python).
Find all posts by this user
Quote this message in a reply
10-12-2023, 06:55 AM (This post was last modified: 10-12-2023 06:56 AM by parisse.)
Post: #14
RE: Assembler, but not at all
I don't think one should consider RPL, this was a language designed for calculators with hardware available in 1980. In order to have acceptable performances for the 49 CAS, I had to code in system RPL, code with the stack and avoid local variables as much as possible, use unnamed local variables (NULLNAME), and critical sections required assembly code. This gives code that is hard to create and maintain and is not portable.
Programming in RPL was reserved to an elite, and that's perhaps a reason why some people like(d) RPL...

WebAssembly is much better. Giac compiled to wasm is about 2* to 3* slower than native compiled code (depends what you are doing, for floating point operations it's sometimes as fast as native code), that's far better than any interpreter. And you could use whatever language compiles to wasm.
Find all posts by this user
Quote this message in a reply
10-12-2023, 07:13 AM (This post was last modified: 10-12-2023 08:08 AM by komame.)
Post: #15
RE: Assembler, but not at all
(10-12-2023 06:55 AM)parisse Wrote:  I don't think one should consider RPL, this was a language designed for calculators with hardware available in 1980. In order to have acceptable performances for the 49 CAS, I had to code in system RPL, code with the stack and avoid local variables as much as possible, use unnamed local variables (NULLNAME), and critical sections required assembly code. This gives code that is hard to create and maintain and is not portable.
Programming in RPL was reserved to an elite, and that's perhaps a reason why some people like(d) RPL...

WebAssembly is much better. Giac compiled to wasm is about 2* to 3* slower than native compiled code (depends what you are doing, for floating point operations it's sometimes as fast as native code), that's far better than any interpreter. And you could use whatever language compiles to wasm.

I never stated that I'd like to see RPL in HP Prime. My primary intention was to demonstrate that PPL is just as slow as RPL.
While there aren't major issues with CAS (as CAS handles its typical tasks quite efficiently in terms of performance), there arises a problem when it comes to cases where iterating over elements of large matrices or lists is required, and you have to use PPL.
Implementing what I wrote about in the first post can be done relatively quickly and at a low cost (essentially for any simple language that can be easily compiled to machine code). Meanwhile, implementing support for WebAssembly (taking into account bug fixes) probably like Python, would take years.
Find all posts by this user
Quote this message in a reply
10-12-2023, 12:15 PM
Post: #16
RE: Assembler, but not at all
(10-12-2023 07:13 AM)komame Wrote:  Implementing what I wrote about in the first post can be done relatively quickly and at a low cost (essentially for any simple language that can be easily compiled to machine code). Meanwhile, implementing support for WebAssembly (taking into account bug fixes) probably like Python, would take years.
I never checked, but your favorite smartphone or tablet browser has support for wasm, which means there are probably open-source MIT license-like implementations of wasm virtual machine that can be compiled to the Prime without too much work, like the Python implementation of the Prime using MicroPython. Jean-Baptiste Boric has probably some good candidate!
Find all posts by this user
Quote this message in a reply
10-12-2023, 12:49 PM
Post: #17
RE: Assembler, but not at all
Perhaps wasm3? https://github.com/wasm3/wasm3
Find all posts by this user
Quote this message in a reply
10-12-2023, 08:40 PM
Post: #18
RE: Assembler, but not at all
(10-12-2023 06:55 AM)parisse Wrote:  I don't think one should consider RPL, this was a language designed for calculators with hardware available in 1980. In order to have acceptable performances for the 49 CAS, I had to code in system RPL, code with the stack and avoid local variables as much as possible, use unnamed local variables (NULLNAME), and critical sections required assembly code. This gives code that is hard to create and maintain and is not portable.

That is one reason that I suggested NewRPL. There is no SysRPL or assembly, NewRPL code is quite fast as-is. Local variables are just as fast as stack operations which helps with readability. My reason for preferring RPL is that it was the language of high-end HP calculators from 1987 to 2013 and there is an immense amount of code written in RPL. If NewRPL could be integrated into the Prime ecosystem, that would provide a modern high-performance platform for new and existing RPL programs.
Find all posts by this user
Quote this message in a reply
10-13-2023, 05:36 AM (This post was last modified: 10-13-2023 05:37 AM by parisse.)
Post: #19
RE: Assembler, but not at all
If you have wasm, then you could compile newrpl to the Prime and have access to these programs. Or you could compile an hp48/49/40 simulator for the Prime in wasm. Or a Numworks simulator for the Prime (this already exists for the TI Nspire and Casio fxcg50).
Find all posts by this user
Quote this message in a reply
10-13-2023, 01:16 PM
Post: #20
RE: Assembler, but not at all
(10-13-2023 05:36 AM)parisse Wrote:  If you have wasm, then you could compile newrpl to the Prime and have access to these programs. Or you could compile an hp48/49/40 simulator for the Prime in wasm. Or a Numworks simulator for the Prime (this already exists for the TI Nspire and Casio fxcg50).

Wasm3 is a great idea, and I would really love for it to be possible, but only under the condition that it can genuinely be done with relatively little effort. Unfortunately, experiences with HP Prime and MicroPython (where the source code is also readily available and was utilized) show that achieving this requires much more effort than one might initially think. The mere existence of the source code accounts for just 50% of the success. MicroPython still doesn't work well on HP Prime because the cross-platform differences are so significant that a large portion of that code requires adaptation. That's why I mentioned that porting Wasm might take years (taking bug fixes into account), as I foresee similar challenges to those faced with MicroPython. For example, screen display management for HP Prime requires an independent implementation (HP Prime doesn't have a graphics card, everything operates non-standard here, and graphic operations would need optimization or even re-writing from scratch for HP Prime), keyboard handling also requires a unique approach, and the file system probably does too. Additionally, a dedicated GUI will be required either on the HP Prime side or the PC side to manage this (perhaps changes to the HP Connectivity Kit might also be needed). So, there's quite a lot of work, and the HP Prime Developer Team is small and already swamped with other tasks. That's why I wrote that this is a project that might take years. Consider that MicroPython is already available on HP Prime for 2.5 years (with work on it starting even earlier), but is still so riddled with bugs that using it in many situations isn't feasible; thus, this "ready source code" didn't help much. Hence my concerns that implementing Wasm could look similar, leading to my proposal (primitive, but effective and easy to implement in several days) to avoid such big projects (though ideologically grand).
However, if implementing wasm is indeed feasible to accomplish even within a year (taking all factors into account), then I am fully in favor of this project.
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 1 Guest(s)