Post Reply 
Emulator vs simulator performance
06-05-2020, 07:27 PM
Post: #6
RE: Emulator vs simulator performance
I only can speak about Emu42, I don't know any details about Free42 inside.

From early times I put the focus of my emulator development on rebuilding the original hardware on functional level as close as possible. I also expect that an emulation of a real machine is faster, or at minimum, have equal speed.

The Saturn CPU emulation inside Emu42 was originally build by Sebastien Carlier for Emu48 and was improved by me in the Emu48 development over the years. This emulation core was never optimized for speed reason.

One anecdote belongs to the Saturn opcode dispatcher. The actual emulator code decoding the Saturn opcode in tables one nibble each. This makes the tables small because 1 nibble = 4 bit -> 2^4 = 16 conditions in a table. Cyrille thought, why not decode 3 nibbles (12 bit) of the opcode at first time. So the first opcode dispatcher table had a size of 2^12 = 4096 entries. On a PC the emulator with 4096 dispatcher table was about 10% faster than the one with 16 entries. Mission accomplished? Not really, the optimization was done for the Pocket PC's with Windows CE or Pocket PC 2002 devices. On these devices the new version was massively slower. And why? The 4096 entry dispatcher table hasn't fit into the 1st level CPU cache and so all accesses to this table must be done over the slow main memory.

So what is important for me? The speed difference between original calculator and emulation. So I think a 170 to 340 times faster emulation is fast enough. This is a benchmark list from 2011 comparing the emulation on different host systems comparing to the real machine:
Code:

Emu42 benchmark results using Erik Ehrling's
"Miller-Rabin Primality Test for the HP-42S"

Prime number: 999,999,999,961

Real HP-42S

ROM REV A: Std clock 1MHz  5m 48s
ROM REV A: Dbl clock 2MHz  2m 52s

C2E6750/2.66GHz/333MHz/DDR2 / 2 GB / Windows XP SP2 / Emu42 v1.10

ROM REV C: Max 1s     Auth 5m 27s

2x E5507/2.26GHz/800MHz/DDR3 / 4 GB / Windows 7 SP1 (x86) / Emu42 v1.14

ROM REV C: Max 2s     Auth 5m 25s

A64X2/3800+/800MHz/DDR2 / 4GB / Windows 7 SP1 (x64) / Emu42 v1.14

ROM REV C: Max 2s     Auth 5m 26s

A64X2/3800+/800MHz/DDR2 / 4GB / Windows XP SP3 / Emu42 v1.14

ROM REV C: Max 2s     Auth 5m 26s

A64X2/3800+/533MHz/DDR2 / 1GB / Windows XP SP2 / Emu42 v1.09beta1

ROM REV C: Max 2s     Auth 5m 26s

P4HT/3.4GHz/400MHz/DDR / 2GB / Windows XP SP3 / Emu42 v1.14

ROM REV C: Max 2s     Auth 5m 27s

P4HT/3.4GHz/400MHz/DDR / 1GB / Windows XP SP2 / Emu42 v1.09beta1

ROM REV C: Max 2s     Auth 5m 27s

P4HT/3.2GHz/400MHz/DDR / 1GB / Windows 2000 SP4 / Emu42 v1.09beta1

ROM REV C: Max 2s     Auth 5m 27s

P4/2.4GHz/533MHz / 1GB / Windows XP SP3 / Emu42 v1.11

ROM REV C: Max 2s     Auth 5m 26s

P4/2.4GHz/533MHz / 256MB / Windows 2000 SP4 / Emu42 v0.98-5

ROM REV C: Max 3s     Auth 5m 28s

P3/1.0GHz/133MHz / 512MB / Windows 2000 SP4 / Emu42 v1.09beta1

ROM REV C: Max 4s     Auth 5m 27s

P3/850MHz/100MHz / 384MB / Windows XP SP1 / Emu42 v0.98-5

ROM REV C: Max 5s     Auth 5m 28s

P3/850MHz/100MHz / 384MB / Windows 2000 SP4 / Emu42 v0.98-5

ROM REV C: Max 5s     Auth 5m 28s

P3/500MHz/100MHz / 256MB / Windows 2000 SP4 / Emu42 v0.98-5

ROM REV C: Max 8s     Auth 5m 24s

P3/500MHz/100MHz / 256MB / Windows 98 / Emu42 v0.98-5

ROM REV C: Max 8s     Auth 5m 24s

P3/450MHz/100MHz / 320MB / Windows 2000 SP4 / Emu42 v0.98-5

ROM REV C: Max 10s    Auth 5m 27s

P3/450MHz/100MHz / 128MB / Windows NT4.0 SP4 / Emu42 v0.98-5

ROM REV C: Max 10s    Auth 5m 24s

P(MMX)/200MHz/66MHz / 96MB / Windows 98SE / Emu42 v0.98-4

ROM REV A: Max 43s    Auth 5m 13s
ROM REV B: Max 44s    Auth 5m 25s
ROM REV C: Max 45s    Auth 5m 25s

P/100MHz/66MHz / 32MB / Windows 95B (OSR2) / Emu42 v0.98-4

ROM REV A: Max 1m 00s Auth 5m 13s
ROM REV B: Max 1m 02s Auth 5m 26s
ROM REV C: Max 1m 01s Auth 5m 25s

ARM PXA310/640MHz / Win Mobile 6 Classic / Emu42PPC v1.10

ROM REV C: Max 17s    Auth 5m 26s

ARM PXA270/624MHz / Win Mobile 5.0 / Emu42PPC v1.09

ROM REV C: Max 17s    Auth 5m 26s

ARM PXA270/624MHz / Win Mobile 2003 SE / Emu42PPC v1.02beta5

ROM REV C: Max 17s    Auth 5m 24s

ARM PXA270/624MHz / Win Mobile 2003 SE / Emu42PPC v1.01

ROM REV C: Max 19s    Auth 5m 24s

ARM PXA270/520MHz / Win Mobile 5.0 / Emu42PPC v1.07beta1

ROM REV C: Max 20s    Auth 40s *1

ARM PXA270/520MHz / Win Mobile 2003 SE / Emu42PPC v1.01

ROM REV C: Max 23s    Auth 5m 24s

ARM PXA255/400MHz / Win Mobile 2003 / Emu42PPC v0.20

ROM REV C: Max 30s    Auth ?m ??s

ARM MSM7200/400MHz / Win Mobile 6 Professional / Emu42PPC v1.09

ROM REV C: Max 30s    Auth 5m 26s

ARM PXA270/312MHz / Win Mobile 2003 SE / Emu42PPC v1.02

ROM REV C: Max 29s    Auth 5m 20s

ARM S3C2410/266MHz / Win Mobile 2003 / Emu42PPC v1.02beta5

ROM REV C: Max 32s    Auth ?m ??s

ARM S3C2410/266MHz / Win Mobile 2003 / Emu42PPC v1.01

ROM REV C: Max 34s    Auth ?m ??s

ARM PXA270/208MHz / Win Mobile 2003 SE / Emu42PPC v1.01

ROM REV C: Max 1m 02s Auth 5m 24s

ARM OMAP850/195MHz / Win Mobile 5.0 / Emu42PPC v1.05beta1

ROM REV C: Max 1m 19s Auth 2m 42s

ARM SA1110/206MHz / Pocket PC 2002 / Emu42PPC v1.05beta1

ROM REV C: Max 1m 34s Auth 5m 26s

ARM SA1110/206MHz / Pocket PC 2000 / Emu42PPC v1.09

ROM REV C: Max 1m 18s Auth 5m 26s


*1 high performance counter run only with 1000Hz, this cause trouble in
   connection with timer2 related routines and "Authentic Speed" setting


Environment

Emu42 v0.98-4/5 and Emu42PPC v0.20-1.09 use the same engine
Since Emu42 v1.12 and Emu42PPC v1.11 Sacajawea hardware support is included,
so implementation got some Lewis/Sacajawea hardware specific switches.

Speed setting in Emu42.ini / registry:

LewisCycles=64

PRM? is the only program in memory. If there are more programs in
memory the position of PRM? has direct influence on the execution
time.

On the Pocket PC / Win Mobile Emu42PPC was the only visible process
running. Tools like Wisbar or running ActiveSync slow down emulation
speed. The Max values on Emu42PPC differs from run to run in a wide
range, the measured values were the fastest ever measured.

Emu42 v1.09beta1 03/05/07
Emu42 v0.98-5    11/12/03
Emu42 v0.98-4    10/30/03

Compiler:
Microsoft Visual C++ 6.0 SP1 <- Emu42 v1.12
Microsoft Visual C++ 6.0 SP5 -> Emu42 v1.13

Settings:
/nologo /Gr /MT /W3 /GX /O2 /Ob2 /D "NDEBUG" /D "WIN32" /D "_WINDOWS"
/D "STRICT" /Fp".\Release/EMU32.pch" /Yu"pch.h" /Fo".\Release/" /Fd".\Release/"
/FD /c

Emu42PPC v0.20 06/09/04
Emu42PPC v1.01

Compiler:
eMbedded Visual C++ 3.0 Edition 2002

Settings:
/nologo /W3 /O2 /Ob0 /D _WIN32_WCE=$(CEVersion) /D "$(CePlatform)" /D "ARM"
/D "_ARM_" /D UNDER_CE=$(CEVersion) /D "UNICODE" /D "_UNICODE" /D "NDEBUG"
/Fp"ARMRel/EMU42.pch" /Yu"pch.h" /Fo"ARMRel/" /Oxs /M$(CECrtMT) /c

Emu42PPC v1.02beta5 01/17/05
Emu42PPC v1.05beta1 01/23/06
Emu42PPC v1.07beta1 07/10/06

Compiler:
eMbedded Visual C++ 3.0 Edition 2002

Settings:
/nologo /W3 /O2 /Ob2 /D _WIN32_WCE=$(CEVersion) /D "$(CePlatform)" /D "ARM"
/D "_ARM_" /D UNDER_CE=$(CEVersion) /D "UNICODE" /D "_UNICODE" /D "NDEBUG"
/Yu"pch.h" /Oxs /M$(CECrtMT) /c


Thanks to Erik Ehrling for contributing the real calculator, P100 and P200
benchmark values.

10/20/11 (c) by Christoph Gießelink, c dot giesselink at gmx dot de

Making repeatable benchmarks on the HP42S are not so easy as it sounds. First of all, the FOCAL code can only search for global labels before his actual position, so is the search position on top of memory and the label is not found so far, the search continues at the .END. position. So if you have many programs with many global labels behind your program, this will slow down program execution. One more detail, these was a bug in the RAW file object loader until Emu42 v1.22. The FOCAL program object loader allows also to import HP41 FOCAL programs saved by the V41 emulator. Because of some internal differences about NULL byte handling, NULL bytes are removed (packing) or added (behind numbers) and so the distance between labels change. The distance on global labels was fixed during the import, the distance on local labels not. This caused execution errors using HP41 programs. Therefore the import now clears the distance information in all local label jump and execute FOCAL opcodes. Don't worry, the HP42 restore these offsets at the first program run, and because of this the first run of a FOCAL program directly after importing is slower then the following runs.

But now to a further difference of emulation and simulation. The Emu42 emulator has to handle some speed related issues running the code of an original ROM. Just remember, the authors of the code haven't thought about running this code 100-400 times faster. So some parts are just done by executing code in a loop to create a delay or the frequency of a beeper. On the HP48 the backarrow key has an autorepeat function. So pressing the backarrow key and holding it, removes slowly character by character in the command line. When you have a machine with runs 100 times faster, and you do the same thing, the input line is immediately empty. So happened with Emu48 running a HP48. But back to Emu42. It took me years to implement the Redeye sending and making the beeper emulation. In both cases the timing is done by the CPU executing opcodes. Moreover, the CPU strobe frequency is not very accurate, so not usable for sending the Redeye Printer protocol. So the ROM code is making a speed calibration of the CPU before printing.

Therefore a loop with known CPU-cycles is executed in a time frame given by a timer referenced by the 32768Hz crystal. The number of loops is counted. Bad is, the register width of the loop counter is too small for a 150 times faster CPU execution, so the count register overrun many many times and so the result of the speed measurement was rubbish effecting the emulation of the Redeye frame transmitter and beep generation.

I think this is an important difference between emulation and simulation.

On the last Allschwil meeting I talked about speed update for my HP-92198 simulation. I done a print of a large HP71 BASIC program with Emu71 which took 180s. I don't know how fast is the original hardware, HP71 and HP-92198 video output, but it's slower. Now I cheated, the actual HP-92198 simulation does the same thing in 8s now. Why I say cheated? The program is compiled with the same C++ compiler running on the same machine. The difference is the display update. The prior version updated the display content after very new character, the actual version only every 30ms. So I modified the test conditions, the result for the user is the same, but it's a huge difference for the CPU.

So when we speak about calculations we are discussing about numerical results of a mathematical problem. When I change the numerical algorithm for some reasons, it's the same cheating as with the display output, I changed the test conditions.

So I think it's quite hard to compare Emu42 with Free42. Use the one which is more suitable for your problem.

BTW how fast is Free42 comparing to Wolfram Mathematica solving the same problem?
Visit this user's website Find all posts by this user
Quote this message in a reply
Post Reply 


Messages In This Thread
RE: Emulator vs simulator performance - Christoph Giesselink - 06-05-2020 07:27 PM



User(s) browsing this thread: 1 Guest(s)