 HP Calculator Wiki

Site Tools

benchmarks:middlesquare

Calculator benchmark: middle square method seed test

The idea

In September 2013 I started to contribute on wiki4hp, and my first self-chosen task was: mine information from discussion places of HP calculators's community to report links to best/useful discussions and organize these by topics. One of those topic is: a general calculator benchmark for programmable calculators. As September 2013, I think that a lot of calculator benchmarks were discussed by the community, but only two has earned enough popularity: the calculator add loop and the nqueens (where n is 8) problem.
Benchmarks are not the ultimate test to measure the absolute computational power of a calculator (or a computer), but they give a good approximation of relative speed between calculators; that is: if a calculator A is faster than a calculator B in a benchmark, for computations that involves similar operations as in the benchmark, we can expect that the calculator A will take less time to finish them.

Now, if we look at code of the two popular benchmark, we can see that the Calculator add loop is really simple: just add one to a previous number, starting from 0, in the smartest way that you can code. The second is more complex but it involves: memory, subtractions and additions, and the “n” is not variable, is fixed to 8. This is a great limitation for new calculators that are really fast. Unfortunately, with a greater n maybe some old calculators can't do anything (since they don't have enough memory), or maybe the community likes the number 8 since the problem is related to chess (that has a chessboard of 8×8). Anyway the benchmark can be extended (increasing the n value by fixed steps), but this can be tried later, hoping that the community will contribute (again).

So thinking about a benchmark that use at least the four (+,-,*,/) operations, I stumbled upon the middle square method to produce, easily, random numbers1). Since we don't care about a particular benchmark algorithm, but we use the benchmark only to compare devices, it should be ok to test almost all programmable calculators, since it doesn't require much memory and it involves the four operations.

So, if you agree (else, feel free to fork the idea on this wiki!) let's look the pseudocode.

Criticism
MoHPC . Message #6. In short, a benchmark should be helpful to compare calculators performance on the same set of the instructions. While the middle square benchmark is more a programming challenge, because some calculators have more instructions than others or have other data types that allows faster/slower/more precise/etc.. operations.

So actually this should be seen as challenge, and only after that as an approximative benchmark.

The Pseudocode
As the two popular benchmarks shows, we don't test only the computational power of the devices, but also the programming ability of the user to “trick” the game.

k:= a positive integer
n:= 100^k
For i:=(n/10) to (n-1) do {
old := i
sequenceIsConvergent := false
limitCyclicSequences := 0 //because we can end in cyclic sequences and we don't want to use a lot of memory!
while ( (NOT sequenceIsConvergent) AND (limitCyclicSequences < n) ) {
new = extract_middle_number from old^2 (see below)
if new == old {
sequenceIsConvergent = true
}
else {
old := new
limitCyclicSequences = limitCyclicSequences+1;
}
}//end while
}//end for

->Result: the time spent to finish the computation given the starting k value.

How does extract_middle_digits work?
Given n, that is a power of 10 of the type 10^d,
its square is equal to 10^(d*2) and it has (d*2+1) digits.

We consider only the last (d*2) digits.
That is: if n=10000, 10000*10000 = 100000000 we consider only 00000000 without the first 1.

Then we 'consider' all the numbers lower than 10^(d*2) with (d*2) digits.
For example if d=4 and the number under process is 1542,
we consider 1542^2 as 02377764 instead of 2377764.

Why d=4 ? Because we use as seed, with n=10000, numbers from 1000 to 9999, so numbers having 4 digits.

After that we pick the d=4 middle digits, from 02377764 we pick 0264.
So the middle number extracted is 3777.

The method of extraction for the digits can be anything you want, a standard method involving modulo and divisions or some other method. You can use an iterative subtraction to get the modulo instead of using the calculator's built-in function to compute it. Here we will see the user ability to exploit all the possibilities of the calculator, hoping that even algorithms with “classic” operations will be reported for a comparison.

More details!

Let k=1 , so we have as reference n:= 100^k = 100
Do sequences for all the seeds from a:=n/10 to n-1.
So the seeds are {a:=10,11,12,...,99}

How do we compute a sequence?
a:=10
seed:=a
1. we first square the number
10*10 = 100
2. 100 squared is 10000, with 4 zeroes, then we consider the previous number with four digits (with a leading zero)
0100
3. Then we pick the middle digits of the number, as many as the zeroes of n (that is 100), the new seed is
00
4. The new seed is the same as the initial seed, then if we reapply the process we end up always with 10. Therefore the sequence is convergent.

Next step
a:=11
seed:=a
seed^2 -> 121
See it with 4 digits -> 0121
Pick middle digits -> 01 , seed:= 12
seed^2 -> 144
See it with 4 digits -> 0144
Pick middle digits -> 04, seed:=14
seed^2 -> 196
See it with 4 digits -> 0196
Pick middle digits -> 06, seed:=19
seed^2 -> 361
See it with 4 digits -> 0361
Pick middle digits -> 01, seed:=36
seed^2 -> 1296
See it with 4 digits -> 1296
Pick middle digits -> 16, seed:=29
seed^2 -> 841
See it with 4 digits -> 0841
Pick middle digits -> 01, seed:=84
seed^2 -> 7056
See it with 4 digits -> 7056
Pick middle digits -> 76, seed:=5
seed^2 -> 25
See it with 4 digits -> 0025
Pick middle digits -> 05, seed:=2
seed^2 -> 4
See it with 4 digits -> 0004
Pick middle digits -> 04, seed:=0
seed^2 -> 0
See it with 4 digits -> 0000
Pick middle digits -> 00, seed:=0
Here the new seed is the same of the previous, so the process stops

let's skip some iterations to show another problem
a:=24
seed:=a
seed^2 -> 576
See it with 4 digits -> 0576
Pick middle digits -> 06, seed:=57
seed^2 -> 3249
See it with 4 digits -> 3249
Pick middle digits -> 39, seed:=24

At this time you can do an infinite loop that is cyclic,
so a way is to limit any sequence to max 100^k iterations.

In this case one could consider the sentence "For a generator of n-digit numbers the period can be no longer than 8^n" on wikipedia, but to make it standard, let's limit it to 100^k iterations.

How to report a result

A result is composed by the following list
- the device used plus the language used, overclock if any, custom firmware if any and so on.
- time elapsed for a given k in seconds (see below)
- the code used.

if the calculator is too slow, or limited, to compute a given n, then report "for n=100^k the computation
takes too much time". Conversely, if the calculator is too fast to compute a given n, then report
"for n=100^k the computation takes too little time, I skipped it"

Physical calculators

Results for n:=100^k, k=1

1. HP 50g , 2.15 with HPGCC3 patch, ARM ASM
• Time: 0.0093 s @75mhz
• Notes: For the code, I obviously needed division. This time, I searched for proper implementations of division on the web and found this one in the official infocenter. It is pretty scary how close I got with my own modulo routine, so it was quite optimized after all. Not many processors allow the programmer to write such a compact version (CMP R7,R8; SUBNES R6,R6,1; BNE WHILEloop) of if(R7!=R8 && (–R6)!=0) goto WHILEloop;
• Code: 2)
2. HP 50g , 2.15 with HPGCC3 patch, SaturnASM
• Time: 0.0359 s @75mhz
• Notes: just with Saturn ASM, though I have to admit that I used some instructions the real Saturn didn't have - only the emulated Saturn can multiply and divide with a single instruction. So what will ARM ASM bring?
Another note on optimization in this code: I let the limitCyclicSequences counter run backwards from n to 0, this saves an instruction or two in the WHILE loop compared to the pseudocode version.
• Code: 3)
3. HP 50g , 2.15 with HPGCC3 patch, SysRPL
• Time: 1.1421 s @75mhz
• Notes: Due to the size of the numbers this test produces I can't use normal bints if I want to run it for k=2. As a consequence, my first try was to translate my UserRPL version to SysRPL using HXS numbers. That one got k=1 solved in about 4 seconds with hardcoded constants n and n² for k=1. When I replaced them by a piece of code that was supposed to calculate them from user input, it was for some reason slowed down to about 12 seconds (yes, slower than my previous UserRPL version). I could have done a version that is slightly faster than the UserRPL version by using real numbers as well, but as the highest number calculated for k=1 is still below the limit of 1048576, I can safely use bints for that one.
• Code: 4)
4. HP 50g , 2.15 HRAST BASIC
5. HP 49g, HRAST BASIC
6. HP 50g , 2.15 with HPGCC3 patch, UserRPL
7. HP 50g , 2.08, UserRPL 75mhz , #1
• Time: 24.4003 sec
• Code: 8)

Results for n:=100^k, k=2

1. HP 50g , 2.15 with HPGCC3 patch, ARM ASM
2. HP 50g , 2.15 with HPGCC3 patch, SaturnASM
3. HP 50g , 2.08, UserRPL 75mhz , #1
• Time: estimated 250'000 secs 9)
• Code: The same of the entry “HP 50g , 2.08, UserRPL 75mhz , #1” for the 100^k, k=1 input.

Emulators on mobile/handheld device OR mobile/handheld devices itself

Why? See the note 10) .
»Devices smaller than a tablet of 7 inches, 7 inches included«

Results for n:=100^k, k=1

1. Palm treo pro GSM (400mhz, 128Mb ram), pythonCE 2.5-20061219 #1
• Time: around 3 seconds (doing also other things, it is multitasking)
• def middleSquareTest(k):
import time;
print time.localtime();
n = 100**k;
nsqrt = n**(1/2.0);
temp = None;
new = None;
for a in range(n/10, n):
old = a;
for j in range(n):
#limit to max 'n' iterations because we don't want unlimited cycles
temp = int(old*old / nsqrt)/float(n) ;
new = (temp-int(temp))*n;
if new == old:
#exit
j = n;
else:
#print old;
old = new;

print time.localtime();


Results for n:=100^k, k=2

1. Palm treo pro GSM (400mhz, 128Mb ram), pythonCE 2.5-20061219 #1
• Time: 27'706 seconds (doing also other things, it is multitasking)
• See entry for k=1

Target devices of this test

As Marcel Samek on 11 Sept 2013 wrote on MoHPC's forum, this test is hard for small programmable calculators that can't play with integer of fractional part of a number. So the taget devices of this test are high end calculators with good mathematical framework.

1)
Even if the method has a lot of flaws from the point of view of randomness.
2)
CODE
A=0.W
GOSBVL POP#
GOSBVL SAVPTR
SKUB {
*start
!ARM
STMDB sp! {R4 R5 R6 R7 R8 R9 R10 LP }
LDR R2,[R1,#2316]
MOV R3,1
MOV R10,10
*ALOGloop
MUL R3,R3,R10
SUBS R2,R2,1
BNE ALOGloop
MUL R2,R3,R3
MOV R7,R2
BL divmod
MOV R4,R9
*FORloop
MOV R5,R4
MOV R6,R4
*WHILEloop
MUL R,R5,R5
MOV R10,R3
BL divmod
MOV R7,R9
MOV R10,R2
BL divmod
MOV R8,R5
MOV R5,R7
CMP R7,R8
SUBNES R6,R6,1
BNE WHILEloop
CMP R4,R2
BNE FORloop
LDMIA sp! {R4 R5 R6 R7 R8 R9 R10 PC}
*divmod
MOV R8,R10
MOV R9,0
CMP R7,R10
MOVLO PC,LR
*.LSloop
MOV R8,R8 LSL #1
CMP R8,R7
BLO .LSloop
*.RSloop
MOVHI R8,R8 LSR #1
CMP R7,R8
SUBHS R7,R7,R8
CMP R8,R10
BHI .RSloop
MOV PC,LR
!ASM
*end
}
C=RSTK
D0=C
D1=80100
LC(5)end-start
MOVEDN
LC 80100
ARMSAT
GOVLNG GETPTRLOOP
ENDCODE
3)
CODE
GOSBVL POP#
GOSBVL SAVPTR
B=A.A
C=0.W
A=0.W
C+1.W
A+10.W
*ALOGloop
C*A.W
B-1.W
?B#0.A
GOYES ALOGloop
R0=C.W
C*C.W
R1=C.W
C/A.W
R2=C.W
*FORloop
B=C.W
D=C.W
*WHILEloop
D-1.W
?D=0.W
GOYES endFORloop
C*C.W
A=R0.W
C/A.WA=R1.W
C%A.W
A=B.W
B=C.W
?A#C.W
GOYES WHILEloop
*endFORloop
C=R2.W
A=R1.W
C+1.W
R2=C.W
?C#A.W
GOYES FORloop
GOVLNG GETPTRLOOP
ENDCODE
4)
::
BINT10 BINT100
BINT100 BINT10 DO
DUPINDEX@ BEGIN
DUPDUP #* 5PICK #/ SWAPDROP 4PICK #/ DROP
SWAPOVER #=
ROT #1- DUP4UNROLL #0=
OR
UNTIL 2DROP
LOOP 2DROP
;
5)
WATCH INTEGER K=1,N=100^K,Q=SQR N
FOR I=N/10 TO N-1 O=I FOR L=N R=MOD(^O/Q,N) IF O<>R@ O=R NEXT ELSE LEAVE
NEXT ? TICKS
6)
WATCH INTEGER K=1,N=100^K,Q=SQR N
FOR I=N/10 TO N-1 O=I FOR L=N R=MOD(^O/Q,N) IF O=R LEAVE ELSE O=R NEXT
NEXT ? TICKS
7)
<<
ALOG DUP SQ
DUP 10. / OVER 1. - FOR a
DUP a DO
DUP SQ 5. PICK / IP 4. PICK MOD
UNTIL
SWAP OVER ==
ROT 1. - UNROT PICK3 NOT
OR
END DROP2
NEXT DROP2
>>
8)
\<<
@k
100 SWAP ^ @n:= 100^k
DUP \v/ @\v/n
0 @old
0 @new
0 @limit
\->
n
nsqr
old
new
limit
\<<
PUSH @saves system flags
-105 SF @approx mode

n 10 /
n 1 -
FOR a
a 'old' STO
1 CF @the sequence is not convergent
0 'limit' STO @to limit cyclic sequences
WHILE
1 FC?
limit n <
AND
REPEAT
@extracts middle digits
old old * @square
nsqr / IP @integer part of old^2/\v/n
n / FP @the previous result / n^2, picks the decimals
n * @restores the decimals as integers
'new' STO
IF
new old ==
THEN
1 SF
ELSE
new 'old' STO
'limit' 1 STO+
END
END
NEXT

POP @restores system flags
\>>
\>>

executed with TEVAL
9)
Because the complexity of the task for the input k=1 is around $O(100 \cdot 100=100^2)$ and it takes 24.4 secs. The complexity of the task with the input k=2 is around $O(100^2 \cdot 100^2 = 100^4)$ that is $100^2$ times the complexity with k=1. So, while the Hp50g is still crunching for a confirmation, the task will take around 250'000 seconds.

Today i interrupted the computation after almost 48 hours and it is still unfinished as expected.
To avoid battery consumption you must use the an usb charger!
10)
Because, a part from school, one can use an handheld device to do math, maybe simple math using built-in functions of the framework, as well. Sure, it is not comfortable as a physical calculator unless it is large enough to simulate a calculator keyboard. 