Jump to content



Photo
* * * * * 2 votes

[Help Needed] Calculator Benchmark

add loop benchmark middle square method bench

  • Please log in to reply
66 replies to this topic

#1 pier4r

pier4r

    Casio Fan

  • Members
  • PipPip
  • 39 posts

Posted 11 September 2013 - 03:35 PM

Hi all!

I recently moved the data about the historical "Calculator add loop" benchmark (on the hpmuseum) on a wiki page, here: http://www.wiki4hp.c...chmarks:addloop .

Since the benchmark, originally, was not updated after 2011, i searched for new results and i found that no one has done it with Casio calculators.

So, is anyone willing to do this benchmark and report the results in this topic?

The format is:
- Calculator used and firmware/software
- The count after 60 seconds of execution
- The program code used.

The pseudocode is:
Do a summation as fast as you can for 60 seconds.
Like:
sum:=0; While (true) { sum++ }

For further comparisons there is another benchmark (just designed): http://www.wiki4hp.c...ks:middlesquare . Even for this any result will be appreciated.

Thanks a lot and sorry if the section is not the "right one" :)


edit: the community has just demanded a simpler benchmark (the middle square one is not so clear). Do you mind to run also this?

The code is:
input: n
--
for k:=3 to n do {
  for j:=2 to k-1 do {
    if ( k mod j == 0 ) then {
      j:= k-1 //so we exit from the inner for
    }
  }
}

The result format is:
A result is composed by the following list
- the device used plus the language used, eventual overclock, eventual custom firmware and so on.
- time elapsed for a given n in seconds (see below)
- the code used.

if the calculator is too slow, or limited, to compute a given n, then report "for n the computation
takes too much time". Conversely, if the calculator is too fast to compute a given n, then report
"for n the computation takes too little time, i skipped it"

The options are
n:= 100
n:= 1000
For very fast implementations:
n:= 10000
n:= 100000

Edited by pier4r, 12 September 2013 - 12:49 AM.

  • flyingfisch likes this

#2 flyingfisch

flyingfisch

    Casio Maniac

  • Deputy
  • PipPipPipPipPipPipPipPip
  • 1891 posts
  • Gender:Male
  • Location:OH,USA
  • Interests:Aviation, Skiing, Programming, Mountain Biking.

  • Calculators:
    fx-9860GII
    fx-CG10 PRIZM

Posted 11 September 2013 - 07:29 PM

Should I do it in casio-BASIC, LuaZM, C, or all three?

#3 pier4r

pier4r

    Casio Fan

  • Members
  • PipPip
  • 39 posts

Posted 11 September 2013 - 08:22 PM

If you wish, all three. There is (almost) no limitation.

PS: http://casio.clrhome...rogramming.html doesn't work :/

Edited by pier4r, 11 September 2013 - 08:23 PM.


#4 flyingfisch

flyingfisch

    Casio Maniac

  • Deputy
  • PipPipPipPipPipPipPipPip
  • 1891 posts
  • Gender:Male
  • Location:OH,USA
  • Interests:Aviation, Skiing, Programming, Mountain Biking.

  • Calculators:
    fx-9860GII
    fx-CG10 PRIZM

Posted 11 September 2013 - 08:49 PM

oh, here's the fixed link: http://casio.clrhome...th-the-money-2/

And I am developing the benchmark programs, I'll tell you when I have finished. :)

EDIT:

Hmm. this may take a little research. I am not quite sure how I am going to write this program as I do not understand the middle number extraction concept very well. But I am trying to figure it out :)

EDIT2:

Oh, just realized that mid squares was optional. I will start with summation. :)

EDIT3:

Results for Casio-BASIC:


NOTE: Timing done by hand with a stopwatch, may be off by as much as half a second.

First attempt


Calculator used: Casio fx-CG 10 PRIZM, clocked at default 58MHz, OS version 01.04.3200
Count after execution: 22658
Code:
0->S
While 1
S+1->S
WhileEnd

Second attempt


Calculator used: Casio fx-CG 10 PRIZM, clocked at default 58MHz, OS version 01.04.3200
Count after execution: 32923
Code:
0->S
While 1
Isz S
WhileEnd

Third attempt


This one just for kicks to see how much time the 0->S takes up.
Calculator used: Casio fx-CG 10 PRIZM, clocked at default 58MHz, OS version 01.04.3200
Count after execution: 33012
Code:
0->S run in Run-Mat before timing
While 1
Isz S
WhileEnd

Fourth attempt


Calculator used: Casio fx-CG 10 PRIZM, overclocked to 94.3MHz (max overclocking without freezing), OS version 01.04.3200
Count after execution: 54006
Code:
0->S run in Run-Mat before timing
While 1
Isz S
WhileEnd

  • pier4r likes this

#5 pier4r

pier4r

    Casio Fan

  • Members
  • PipPip
  • 39 posts

Posted 11 September 2013 - 11:06 PM

Oh, thanks, i'll start to copy the results.

Is there any classpad user that will do the summation test? Thanks!

#6 pier4r

pier4r

    Casio Fan

  • Members
  • PipPip
  • 39 posts

Posted 12 September 2013 - 12:47 AM

Added and i also added another bench (very simple) on the first post.

Edited by pier4r, 12 September 2013 - 06:44 AM.


#7 3298

3298

    Casio Addict

  • Members
  • PipPipPip
  • 79 posts
  • Gender:Male
  • Location:Germany

  • Calculators:
    fx-9750G Plus
    Algebra FX 2.0 (ROM 1.03,broken)
    HP 50G

Posted 12 September 2013 - 09:29 AM

I dug out my old fx-9750G+ and executed flyingfisch's first and second attempt on it. There is only one ROM version for this calculator (1.00), its CPU is clocked at 4.3MHz and cannot be overclocked without hardware modifications (with modifications the maximum would be 8MHz). The counts are 7561 and 16526, also measured with a stopwatch. All calculators from the CFX series should perform equally bad because they only have minor differences in places this code doesn't touch.


EDIT:
Results on my brother's AFX (ROM 1.05, but my broken one with ROM 1.03 felt very similar in terms of speed): 8115 for attempt number one, 12941 for attempt number two.
I can't give an exact clock speed because sources are contradicting each other. The source I believe the most says it can execute 8 million instructions per second, which gives a clock speed of at least 24MHz because the CPU is a NEC V30, which takes multiple cycles per instruction like any other early x86.


Also, I gave the other benchmark a try. This is what I made from the pseudocode above:
?->N
For 3->K To N
For 2->J To K-1
If K-JInt (K/J)=0
Then K-1->J
IfEnd
Next
Next
The 9750 turns off (which kills the program) before finishing for n=1000 already, so I will just give you the result for n=100, which is 43 seconds.
EDIT:
The AFX takes 31 seconds.

There are some optimisations, the first one is a replacement of the assignment of the If-body with a Break statement, which makes the code look like this:
?->N
For 3->K To N
For 2->J To K-1
If K-JInt (K/J)=0
Then Break
IfEnd
Next
Next
Result is the same: 43 seconds for n=100.
EDIT:
The AFX is slightly faster than in the first test, it takes about 30 seconds.

Next optimisation is the replacement of the If itself with the => operator.
?->N
For 3->K To N
For 2->J To K-1
K-JInt (K/J)=0=>Break
Next
Next
This one is slightly faster, it only takes 40 seconds.
EDIT:
The AFX took 28 seconds. Now why did Casio make it almost impossible to get the => symbol onto the AFX? I needed to use a hex editor. By the way, thanks to 2072 for TOUCHE, there is still someone using it (me, for example to get this symbol).


I can also give you code & test results from a HP 50G in UserRPL and SysRPL if you are interested.
EDIT:
I see you have those already. My SysRPL bint version for the add loop is faster, though. And I can give you a Saturn ASM version as well.
My SysRPL version gives 153421, probably because it only checks ON instead of the entire keyboard:
:: BINT0 BEGIN #1+ ATTN? UNTIL UNCOERCE ;
The Saturn ASM version looks like this:
CODE
GOSBVL SAVPTR
D0=80EAB ;ATTNFLG
A=0.W
*loop
A+1.W
C=DAT0.A
?C=0.A
GOYES *loop
GOSBVL GETPTR
P=15
GOVLNG PUSHhxsLoop
ENDCODE
After a minute it has counted up to 6469858. Oh, and my system details are: OS version 2.15 with HPGCC3 patch (but that patch shouldn't affect the results at all) on a calculator clocked at the default speed, i.e. 75MHz.

Edited by 3298, 12 September 2013 - 12:54 PM.

  • pier4r likes this

#8 pier4r

pier4r

    Casio Fan

  • Members
  • PipPip
  • 39 posts

Posted 12 September 2013 - 10:26 AM

Thanks 3298! I have added your results!

For HP50g, do it as you wish (i have one, and is still crunching the "10000" test), i don't know how to program in SysRPL so it will be appreciated!

#9 flyingfisch

flyingfisch

    Casio Maniac

  • Deputy
  • PipPipPipPipPipPipPipPip
  • 1891 posts
  • Gender:Male
  • Location:OH,USA
  • Interests:Aviation, Skiing, Programming, Mountain Biking.

  • Calculators:
    fx-9860GII
    fx-CG10 PRIZM

Posted 12 September 2013 - 12:52 PM

OK, I just came up with some new code, almost twice as fast!

Fifth attempt


Calculator used: Casio fx-CG 10 PRIZM, clocked at default 58MHz, OS version 01.04.3200
Count after execution: 51844
Code:
For 1->I To 1000000000
Next

Sixth attempt


Calculator used: Casio fx-CG 10 PRIZM, overclocked to 94.3MHz (max overclocking without freezing), OS version 01.04.3200
Count after execution: 85074
Code:
For 1->I To 1000000000
Next


EDIT:
Using 3298's code, here are my results for the ultra naive primes benchmark.


Calculator used: Casio fx-CG 10 PRIZM, clocked at default 58MHz, OS version 01.04.3200, Casio-BASIC
Time elapsed for n=100: 6.5 sec
Time elapsed for n=1000: 386 sec
Code:
?->N
For 3->K To N
For 2->J To K-1 K-JInt (K/J)=0=>Break
Next
Next

Calculator used: Casio fx-CG 10 PRIZM, overclocked to 94.3MHz (max overclocking without freezing), OS version 01.04.3200
Time elapsed for n=100: 3.9 sec
Time elapsed for n=1000: 237 sec
Code: same as above


  • pier4r likes this

#10 3298

3298

    Casio Addict

  • Members
  • PipPipPip
  • 79 posts
  • Gender:Male
  • Location:Germany

  • Calculators:
    fx-9750G Plus
    Algebra FX 2.0 (ROM 1.03,broken)
    HP 50G

Posted 12 September 2013 - 01:02 PM

Hm, that's not really faster on the 9750 (stopped at 13449) and AFX (stopped at 9874). Casio definitely improved something since those old calcs were released.

#11 flyingfisch

flyingfisch

    Casio Maniac

  • Deputy
  • PipPipPipPipPipPipPipPip
  • 1891 posts
  • Gender:Male
  • Location:OH,USA
  • Interests:Aviation, Skiing, Programming, Mountain Biking.

  • Calculators:
    fx-9860GII
    fx-CG10 PRIZM

Posted 12 September 2013 - 01:26 PM

you mean on the older calcs a For loop is not faster than Isz and +?

#12 pier4r

pier4r

    Casio Fan

  • Members
  • PipPip
  • 39 posts

Posted 12 September 2013 - 02:07 PM

Added (thanks, with primz and lua/c there are any chances to do better? I'm curious!)

One question, i think that 237 sec is the result of primz overclocked, instead of 386 (the primz @ default)

#13 3298

3298

    Casio Addict

  • Members
  • PipPipPip
  • 79 posts
  • Gender:Male
  • Location:Germany

  • Calculators:
    fx-9750G Plus
    Algebra FX 2.0 (ROM 1.03,broken)
    HP 50G

Posted 12 September 2013 - 02:17 PM

ffish: Yes, exactly that. Just look at the numbers.

I wrote the ultra-naive prime benchmark in Saturn ASM. It is really fast despite running on a CPU emulator on the 75MHz ARM9: 0.0209 seconds for n=100, 0.1609 seconds for n=1000, 1.7902 seconds for n=10000 and 20.6464 seconds for n=100000, all of them measured with TEVAL. Here is the code:
CODE
GOSBVL POP#
GOSBVL SAVPTR
A+1.A
R0=A.A
A=0.A
A+3.A
*outer
R1=A.A
B=0.A
B+2.A
*inner
C=B.A
GOSBVL IntDiv
?A=0.A
GOYES outer_end
B+1.A
A=R1.A
?A>B.A
GOYES inner
*outer_end
A=R1.A
A+1.A
C=R0.A
?C>A
GOYES outer
GOVLNG GETPTRLOOP
ENDCODE
Actually, MASD (the compiler) was smart and replaced the call to the old implementation of division (GOSBVL IntDiv) with an instruction introduced with the Saturn emulator (DIV.A). Of course, that's a lot faster than the old implementation which all calculators with real Saturn chips (49G and older) had to use.
I hope these short times are not a result of an error in my code. :rolleyes:

Edited by 3298, 12 September 2013 - 03:19 PM.


#14 flyingfisch

flyingfisch

    Casio Maniac

  • Deputy
  • PipPipPipPipPipPipPipPip
  • 1891 posts
  • Gender:Male
  • Location:OH,USA
  • Interests:Aviation, Skiing, Programming, Mountain Biking.

  • Calculators:
    fx-9860GII
    fx-CG10 PRIZM

Posted 12 September 2013 - 02:21 PM

Added (thanks, with primz and lua/c there are any chances to do better? I'm curious!)

One question, i think that 237 sec is the result of primz overclocked, instead of 386 (the primz @ default)


Oh yes, sorry accidentally mixed those up :blush:

Sure working on LuaZM and C implementations right now :)

#15 pier4r

pier4r

    Casio Fan

  • Members
  • PipPip
  • 39 posts

Posted 12 September 2013 - 02:25 PM

3298 the saturn ASM is freaking faster! It's even faster than smartphones with 600mhz of cpu (and python, check the wiki page in the first post)!

just for confirmation (even if the pseudocode is very simple), could you do a little test (for example, print numbers crunched fro k:=3 to 13) to confirm the correctness of the code? (1) Thanks!

PS: i don't know much the saturn assembler for hp50g, are there chances that it translate the code for saturn in code for ARM ?

(1) Because the saturn ASM has done a result, in the bench "summation", similar to the HP prime. The HP prime should have a computational power similar to the nokia e5-00 with python, but the e5 is much slower than your saturn ASM results ad it seems a contradiction.

Edited by pier4r, 12 September 2013 - 02:48 PM.


#16 pier4r

pier4r

    Casio Fan

  • Members
  • PipPip
  • 39 posts

Posted 12 September 2013 - 02:42 PM

Added the results for add loop benchmark about hp50g here http://community.cas...ark/#entry58749 .

Again thanks!

#17 pier4r

pier4r

    Casio Fan

  • Members
  • PipPip
  • 39 posts

Posted 12 September 2013 - 03:37 PM

I did some math about the "suspects" exposed in http://community.cas...ark/#entry58757 .

The code has a temporal complexity of O(n*(n+1) / 2) , so times should be loosely proportional.

Then, with 10n we have the following proportion (10n(10n+1)/2)/(n(n+1)/2) ~ t_10n/t_n that is equal to t_10n ~ t_n * (10*(10n+1))/(n+1) , in our case n=100 so we have 99 as a "coefficent".
Thus for Hp50g:
8 sec * 99 = 792 sec (result of n:=100 times the coefficent) and the real result is 545
For Primz
6.5*99 = 643 sec while the real is 386
3.9*99 = 386 sec while the real is 237

Note: the same order of magnitude!

Similarly the saturn ASM should do:
0.16 * 99 = 15.84 sec (of course the real is less, but not so much less like 1.7!)
15.84 * 99 = 1568 (while 20 secs is way lower!)

So, sorry 3298, you spent your time and i thank you a lot, but could you check the correctness of saturn ASM? Thanks!

Edited by pier4r, 12 September 2013 - 03:39 PM.


#18 flyingfisch

flyingfisch

    Casio Maniac

  • Deputy
  • PipPipPipPipPipPipPipPip
  • 1891 posts
  • Gender:Male
  • Location:OH,USA
  • Interests:Aviation, Skiing, Programming, Mountain Biking.

  • Calculators:
    fx-9860GII
    fx-CG10 PRIZM

Posted 12 September 2013 - 03:37 PM

Results for LuaZM



Summation


Calculator used: Casio fx-CG 10 PRIZM, clocked at default 58MHz, OS version 01.04.3200
Count after execution: 547223
Code:
NOTE: When running this code, note that it breaks when AC/on is pressed. I used the fastest Getkey routine for PRIZM, I may try a test directly polling the RTC in the future.
zmg.keyDirectPoll()
for i=1,100000000000 do
zmg.keyDirectPoll()
if zmg.keyDirect(10)>0 then print(i) break end
end

Calculator used: Casio fx-CG 10 PRIZM, overclocked to 94.3MHz (max overclocking without freezing), OS version 01.04.3200
Count after execution: 893207
Code: same as last test.

Ultra-Naive Primes


Calculator used: Casio fx-CG 10 PRIZM, clocked at default 58MHz, OS version 01.04.3200, Casio-BASIC
Time elapsed for n=100: 0.477 sec
Time elapsed for n=1000: 48.382 sec
Code:
NOTE: uses the RTC to get the times, should be more accurate than a stopwatch, and it doesn't slow down program execution because it is done outside the loop.
local start = zmg.ticks()
n=100
for k=3,n do
for j=2,k-1 do
if k%j==0 then j=k-1 end
end
end

print('n=100: ' .. (zmg.ticks()-start)/128 .. 'sec')

local start = zmg.ticks()
n=1000
for k=3,n do
for j=2,k-1 do
if k%j==0 then j=k-1 end
end
end

print('n=1000: ' .. (zmg.ticks()-start)/128 .. 'sec')

Calculator used: Casio fx-CG 10 PRIZM, overclocked to 94.3MHz (max overclocking without freezing), OS version 01.04.3200
Time elapsed for n=100: 0.297 sec
Time elapsed for n=1000: 29.773 sec
Code: Same as last test.



EDIT:
I forgot to do this earlier:
Hello pier4r and welcome to UCF! You should introduce yourself.

  • pier4r likes this

#19 3298

3298

    Casio Addict

  • Members
  • PipPipPip
  • 79 posts
  • Gender:Male
  • Location:Germany

  • Calculators:
    fx-9750G Plus
    Algebra FX 2.0 (ROM 1.03,broken)
    HP 50G

Posted 12 September 2013 - 04:14 PM

Sorry, apparently IntDiv screwed up my B register and the documentation only said it would destroy D. And DIV.A faithfully reproduces that. Now who wrote that doc? :die:
I'm waiting on my calculator for more correct results now.

#20 pier4r

pier4r

    Casio Fan

  • Members
  • PipPip
  • 39 posts

Posted 12 September 2013 - 04:20 PM

Added! Thanks!

Thanks 3298 for checking it!

Edited by pier4r, 12 September 2013 - 04:22 PM.


#21 3298

3298

    Casio Addict

  • Members
  • PipPipPip
  • 79 posts
  • Gender:Male
  • Location:Germany

  • Calculators:
    fx-9750G Plus
    Algebra FX 2.0 (ROM 1.03,broken)
    HP 50G

Posted 12 September 2013 - 04:40 PM

I just got some TEVAL results. This time I used R2 instead of B. Let's hope IntDiv doesn't touch any other registers apart from A, B, C and D, or I will slap the writer of that erroneous documentation. :rant:
Times are 0.0764 for n=100, 4.0665 for n=1000 and 261.1537 for n=10000. I didn't try n=100000 yet, it will probably take hours with the corrected code. And here is the code itself:
CODE
GOSBVL POP#
A+1.A
R0=A.A
A=0.A
A+3.A
*outer
R1=A.A
C=0.A
C+2.A
*inner
R2=C.A
GOSBVL IntDiv
?A=0.A
GOYES outer_end
C=R2.A
C+1.A
A=R1.A
?A>C.A
GOYES inner
*outer_end
A=R1.A
A+1.A
C=R0.A
?C>A
GOYES outer
GOVLNG Loop
ENDCODE

EDIT:
I just took a look at the IntDiv implementation. Yes, it uses B. No, it uses no other registers. So this one should actually be correct.

Edited by 3298, 12 September 2013 - 04:44 PM.

  • pier4r likes this

#22 pier4r

pier4r

    Casio Fan

  • Members
  • PipPip
  • 39 posts

Posted 12 September 2013 - 04:48 PM

They seem consistent after a confrontation with the primz with Lua. Thanks :). Now i add them

Done!

Edited by pier4r, 12 September 2013 - 04:55 PM.


#23 flyingfisch

flyingfisch

    Casio Maniac

  • Deputy
  • PipPipPipPipPipPipPipPip
  • 1891 posts
  • Gender:Male
  • Location:OH,USA
  • Interests:Aviation, Skiing, Programming, Mountain Biking.

  • Calculators:
    fx-9860GII
    fx-CG10 PRIZM

Posted 12 September 2013 - 05:36 PM

Results for C, compiled with PrizmSDK


NOTE: all results timed with a stopwatch.

Summation


Calculator used: Casio fx-CG 10 PRIZM, clocked at default 58MHz, OS version 01.04.3200
Count after execution: 4246899
Code:
NOTE: AC/on displays value of int i, MENU exits
#include <display_syscalls.h>
#include <keyboard_syscalls.h>
#include <keyboard.hpp>
#include <color.h>

// Getkey routine
const unsigned short* keyboard_register = (unsigned short*)0xA44B0000;
unsigned short lastkey[8];
unsigned short holdkey[8];

void keyupdate(void) {
memcpy(holdkey, lastkey, sizeof(unsigned short)*8);
memcpy(lastkey, keyboard_register, sizeof(unsigned short)*8);
}
int keydownlast(int basic_keycode) {
int row, col, word, bit;
row = basic_keycode%10;
col = basic_keycode/10-1;
word = row>>1;
bit = col + 8*(row&1);
return (0 != (lastkey[word] & 1<<bit));
}
int keydownhold(int basic_keycode) {
int row, col, word, bit;
row = basic_keycode%10;
col = basic_keycode/10-1;
word = row>>1;
bit = col + 8*(row&1);
return (0 != (holdkey[word] & 1<<bit));
}

int main() {
int i=0;
int key;
// clear screen
Bdisp_AllClr_VRAM();
while (1) {
keyupdate();
// increment i
i++;
if (keydownlast(KEY_PRGM_ACON)) {
char buffer[10];
strcpy(buffer," ");
itoa(i, buffer+2);
PrintXY(1,1,buffer,0,COLOR_BLACK);
Bdisp_PutDisp_DD();
}
// handle [menu]
if (keydownlast(KEY_PRGM_MENU)) {
GetKey(&key);
}
}

return 1;
}

Calculator used: Casio fx-CG 10 PRIZM, overclocked to 94.3MHz (max overclocking without freezing), OS version 01.04.3200
Count after execution: 6921042
Code: same as last test.

Ultra-Naive Primes


Calculator used: Casio fx-CG 10 PRIZM, clocked at default 58MHz, OS version 01.04.3200, Casio-BASIC
Time elapsed for n=100: too fast to calculate
Time elapsed for n=1000: too fast to calculate
Time elapsed for n=10000: 9.7 sec
Time elapsed for n=100000: 720 sec
Code:
#include <display_syscalls.h>
#include <keyboard_syscalls.h>
#include <keyboard.hpp>
#include <color.h>

// Getkey routine
const unsigned short* keyboard_register = (unsigned short*)0xA44B0000;
unsigned short lastkey[8];
unsigned short holdkey[8];

void keyupdate(void) {
memcpy(holdkey, lastkey, sizeof(unsigned short)*8);
memcpy(lastkey, keyboard_register, sizeof(unsigned short)*8);
}
int keydownlast(int basic_keycode) {
int row, col, word, bit;
row = basic_keycode%10;
col = basic_keycode/10-1;
word = row>>1;
bit = col + 8*(row&1);
return (0 != (lastkey[word] & 1<<bit));
}
int keydownhold(int basic_keycode) {
int row, col, word, bit;
row = basic_keycode%10;
col = basic_keycode/10-1;
word = row>>1;
bit = col + 8*(row&1);
return (0 != (holdkey[word] & 1<<bit));
}

int main() {
int n=100;
int key;
// clear screen
Bdisp_AllClr_VRAM();
PrintXY(1,1,"Wait...",0,COLOR_BLACK);
Bdisp_PutDisp_DD();

for (int k = 3; k < n; ++k) {
for (int j = 2; j < k-1; ++j) {
if(k%j==0) {
j = k-1;
}
}
}

PrintXY(1,1," done",0,COLOR_BLACK);
Bdisp_PutDisp_DD();
GetKey(&key);

// second time around...
n=1000;
// clear screen
Bdisp_AllClr_VRAM();
PrintXY(1,1," Wait...",0,COLOR_BLACK);
Bdisp_PutDisp_DD();

for (int k = 3; k < n; ++k) {
for (int j = 2; j < k-1; ++j) {
if(k%j==0) {
j = k-1;
}
}
}

PrintXY(1,1," done2",0,COLOR_BLACK);
Bdisp_PutDisp_DD();

GetKey(&key);

// third time around...
n=10000;
// clear screen
Bdisp_AllClr_VRAM();
PrintXY(1,1," Wait...",0,COLOR_BLACK);
Bdisp_PutDisp_DD();

for (int k = 3; k < n; ++k) {
for (int j = 2; j < k-1; ++j) {
if(k%j==0) {
j = k-1;
}
}
}

PrintXY(1,1," done3",0,COLOR_BLACK);
Bdisp_PutDisp_DD();

GetKey(&key);

// fourth time...
n=100000;
// clear screen
Bdisp_AllClr_VRAM();
PrintXY(1,1," Wait...",0,COLOR_BLACK);
Bdisp_PutDisp_DD();

for (int k = 3; k < n; ++k) {
for (int j = 2; j < k-1; ++j) {
if(k%j==0) {
j = k-1;
}
}
}

PrintXY(1,1," done4",0,COLOR_BLACK);
Bdisp_PutDisp_DD();


while(1) {
GetKey(&key);
}

return 1;
}

Calculator used: Casio fx-CG 10 PRIZM, overclocked to 94.3MHz (max overclocking without freezing), OS version 01.04.3200
Time elapsed for n=100: too fast to calculate
Time elapsed for n=1000: too fast to calculate
Time elapsed for n=10000: 6.1 sec
Time elapsed for n=100000: 455 sec
Code: Same as last test.


Just so you know, its Prizm, not Primz ;)




EDIT:
Why don't you make a CSV file and graph the results?


EDIT:
I posted about your project on my blog ;)

  • pier4r likes this

#24 pier4r

pier4r

    Casio Fan

  • Members
  • PipPip
  • 39 posts

Posted 12 September 2013 - 06:59 PM

I'm stunned. Are you really sure that the prizm (1) + C has done the task with n=10'000 within 10 seconds?
That's almost unbelievable! (2)
Ok i'll report it.

(1) Sotty about the name, i remember always "primes - primz"

(2) Since the add loop benchmark shows result similar to the saturn ASM performance done by 3298, so i expected a results similar too for ultra naive primes.
Edit: it's also true that your C code about summations check a lot of things, while the other one simply compute without any check!

PS: about the chart. I hope to gather more results (for example: hp50g with hpgcc, ti nspire , nspire cx, 9860G, ti-8X series and so on with both standard user language and C programs), and then i'll do it.

Edited by pier4r, 12 September 2013 - 07:21 PM.


#25 flyingfisch

flyingfisch

    Casio Maniac

  • Deputy
  • PipPipPipPipPipPipPipPip
  • 1891 posts
  • Gender:Male
  • Location:OH,USA
  • Interests:Aviation, Skiing, Programming, Mountain Biking.

  • Calculators:
    fx-9860GII
    fx-CG10 PRIZM

Posted 12 September 2013 - 07:07 PM

I am absolutely sure it calculated 10,000 in less than 10 sec. You might want to check my code, but I am pretty sure its good. ;)

#26 pier4r

pier4r

    Casio Fan

  • Members
  • PipPip
  • 39 posts

Posted 12 September 2013 - 07:34 PM

No it's ok (just, look at the page, your prizm + C has done better than good smartphones released in 2009-2010, even if these use a scripting language). Just.... with a n=10'000 under ten seconds, is sure that with addition loop you can almost take the hp50g (with 161 million) . But you need to avoid a lot of keychecks.

Edited by pier4r, 12 September 2013 - 07:35 PM.


#27 3298

3298

    Casio Addict

  • Members
  • PipPipPip
  • 79 posts
  • Gender:Male
  • Location:Germany

  • Calculators:
    fx-9750G Plus
    Algebra FX 2.0 (ROM 1.03,broken)
    HP 50G

Posted 12 September 2013 - 09:37 PM

And I got something else wrong. To be precise, it was the part about DIV.A being an instruction. I threw it at the hex editing tools and found out that it is just an assembler macro. So the results I wrote are actually using the slow software division routine. I tracked down the real hardware division and modulo instructions and came up with this piece of code:
CODE
GOSBVL POP#
GOSBVL SAVPTR
A+1.A
B=0.A
B+3.A
*outer
C=0.A
C+2.A
*inner
D=B.A
D%C.A
?D=0.A
GOYES outer_end
C+1.A
?B>C.A
GOYES inner
*outer_end
B+1.A
?A>B.A
GOYES outer
GOVLNG GETPTRLOOP
ENDCODE
It is quite a bit faster, n=100 takes 0.0302 seconds, n=1000 takes 1.4668 seconds, n=10000 takes 107.6708 seconds. I don't really want to try n=100000 because I don't feel like waiting for days for this to finish, not to mention that the calculator will eventually turn off automatically.
So now a piece of ASM code for a 4-bit processor emulated on a 75MHz machine now has the best results on a physical calc except for flyingfishs C code. But I think I can outperform that one with a carefully written ARM ASM program. I'll take a look tomorrow (11:30 pm here).
  • pier4r likes this

#28 pier4r

pier4r

    Casio Fan

  • Members
  • PipPip
  • 39 posts

Posted 12 September 2013 - 10:09 PM

Thanks for the correction 3298. It's late here too (I'm about 1800km to the south... in detail, south Italy).

I'll appreciate further analysis (with either C-arm or ASM arm)

edit: about your note for the speed of saturn ASM, you are right, it is impressive. I can't imagine what it is possible without any emulation, maybe a x10 or x25 improvement?

That will confirm also that the prizm C code can do way better on the summation test.

Edited by pier4r, 12 September 2013 - 10:18 PM.


#29 pier4r

pier4r

    Casio Fan

  • Members
  • PipPip
  • 39 posts

Posted 13 September 2013 - 06:34 AM

A question for flyingfisch from the hpmuseum board

"Also, out of curiosity you should ask over in the casio forum if they have any way to do a *native* numerical calculation instead of a pure C int calculation in the C program. It would be interesting to see the casio math library MOD function used instead with the user number object instead of a pure integer for comparison."

Plus, there is someone with a 9860 willing to do the ultranaive primes test? Where should i do ask?

Edited by pier4r, 13 September 2013 - 07:31 AM.


#30 3298

3298

    Casio Addict

  • Members
  • PipPipPip
  • 79 posts
  • Gender:Male
  • Location:Germany

  • Calculators:
    fx-9750G Plus
    Algebra FX 2.0 (ROM 1.03,broken)
    HP 50G

Posted 13 September 2013 - 12:20 PM

Counting test with ARM ASM is finished. I didn't use HPGCC or HPGCC3, but plain ARM code with a small Saturn loader (this is necessary to interact with the stack, to move the code to a valid ARM code address (must be 32-bit aligned) and to break out of the emulator). I made a small optimisation: the ON key is only checked when the lowest 4 bits of the counter are 0. This avoids some memory accesses (ARM I/O is always memory-mapped) without losing too much precision - after all, I stopped it while looking at a normal clock. Also, I used two registers for the counter because I feared a single register would overflow. This was not the case, but as the result showed, 8 minutes would have made it overflow.
The counter showed 593615984. That is MASSIVELY more than flyingfishs C code. Why? Both calcs have a 32-bit CPU, the clock frequency with flyingfishs second test was even higher, both code pieces are executed directly on the CPU ... and still I get 100 times as high in the same 60 seconds. I don't think it was my optimisation, because the code itself must still be loaded from memory. (Disclaimer: I don't know how big the code cache is, if there is one at all.)
And this is the code:
CODE
GOSBVL SAVPTR
SKUB {
*start
!ARM
STMDB sp! {R4 R5 LP}
MOV R2,0
MOV R3,0
MOV R4,$7A00000 ;the lowest bit of $7A00054 is 1 if ON is pressed
ADD R4,R4,$54 ;the address needs to be loaded in two steps because ARM can only load 8 bits at a time
*loop
ADD R2,R2,1
ADC R3,R3,0
TST R2,$F
BNE loop
LDRB R5,[R4]
TST R5,1
BEQ loop
STR R2,[R1,#2316] ;Saturn A register, lower half
STR R3,[R1,#2320] ;Saturn A register, upper half
LDMIA sp! {R4 R5 PC}
!ASM
*end
}
C=RSTK
D0=C
D1=80100
LC(5)end-start
MOVEDN
LC 80100
INTOFF
ARMSAT
INTON
GOSBVL GETPTR
P=15
GOVLNG PUSHhxsLoop
ENDCODE

Edited by 3298, 13 September 2013 - 12:23 PM.

  • pier4r likes this

#31 flyingfisch

flyingfisch

    Casio Maniac

  • Deputy
  • PipPipPipPipPipPipPipPip
  • 1891 posts
  • Gender:Male
  • Location:OH,USA
  • Interests:Aviation, Skiing, Programming, Mountain Biking.

  • Calculators:
    fx-9860GII
    fx-CG10 PRIZM

Posted 13 September 2013 - 12:27 PM

That will confirm also that the prizm C code can do way better on the summation test.


Yeah, I will work on that code. I'm pretty sure it's the keystroke routine that's slowing it down.


A question for flyingfisch from the hpmuseum board

"Also, out of curiosity you should ask over in the casio forum if they have any way to do a *native* numerical calculation instead of a pure C int calculation in the C program. It would be interesting to see the casio math library MOD function used instead with the user number object instead of a pure integer for comparison."

Plus, there is someone with a 9860 willing to do the ultranaive primes test? Where should i do ask?


Sorry, I do have a 9860, but I am out of town and I left it at home. Maybe Anonymouse can do it.

Also, about using a native routine, I don't think there is one available, but I am not an expert. I suppose I could ask SimonLothar if he knows the syscalls. ;)




EDIT:

The counter showed 593615984. That is MASSIVELY more than flyingfishs C code. Why? Both calcs have a 32-bit CPU, the clock frequency with flyingfishs second test was even higher, both code pieces are executed directly on the CPU ... and still I get 100 times as high in the same 60 seconds. I don't think it was my optimisation, because the code itself must still be loaded from memory. (Disclaimer: I don't know how big the code cache is, if there is one at all.)


I don't have enough programming knowledge yet, but maybe if someone tried writing it directly in ASM for prizm, it would be faster. Also, I could see if a for loop is faster as well.


#32 pier4r

pier4r

    Casio Fan

  • Members
  • PipPip
  • 39 posts

Posted 13 September 2013 - 02:05 PM

593615984

What the heck. Now i'm porting some results on the savage test http://www.wiki4hp.c...nchmarks:savage, after that i'll copy your result ASAP.

A huge thanks!

An hint for the C code on the prizm from a GCC4TI dev: http://tiplanet.org/...=147885#p147885

Edited by pier4r, 13 September 2013 - 03:09 PM.


#33 flyingfisch

flyingfisch

    Casio Maniac

  • Deputy
  • PipPipPipPipPipPipPipPip
  • 1891 posts
  • Gender:Male
  • Location:OH,USA
  • Interests:Aviation, Skiing, Programming, Mountain Biking.

  • Calculators:
    fx-9860GII
    fx-CG10 PRIZM

Posted 13 September 2013 - 03:14 PM

OK, ran the savage benchmark on the PRIZM.

Device: Casio fx-CG 10 PRIZM, clocked at default 58MHz, OS version 01.04.3200, Casio-BASIC
Number of digits: 9
Time elapsed: 33 sec
Result printed: 2499.999981
Code used:
Rad
Fix 9
1->A
For 1->I To 2499
(tan tan^-1 e^ln sqrt(A*A))+1->A
Next
A

Device: Casio fx-CG 10 PRIZM, overclocked to 94.3MHz (max overclocking without freezing), OS version 01.04.3200
Number of digits: 9
Time elapsed: 19 sec
Result printed: 2499.999981
Code used: same as last test
  • pier4r likes this

#34 3298

3298

    Casio Addict

  • Members
  • PipPipPip
  • 79 posts
  • Gender:Male
  • Location:Germany

  • Calculators:
    fx-9750G Plus
    Algebra FX 2.0 (ROM 1.03,broken)
    HP 50G

Posted 13 September 2013 - 03:27 PM

Well, it executes 4*16+3 instructions for every 16 numbers counted, one of them is a memory access instruction. According to my calculations that is 41.4M instructions per second, which is still within the theoretical limit of 75M instructions per second as the ARM takes one clock cycle for most instructions. The biggest exceptions are memory accesses (limited by the memory speed, can be enhanced by caches) and multiplication, which is not used in my code pieces. (Division would be another slow instruction, but the ARM doesn't have one at all. Usually it is implemented in software. When I talked about a hardware division earlier, I really meant the emulated Saturn hardware.)

Speaking about division emulation, I just built one myself for the ultra-naive primes benchmark. Of course, it slows things down a lot because I wrote it myself instead of taking a prebuilt one off the internet, and I didn't really optimize it much. Still, the results are not bad: 0.0121 seconds for n=100 (the Saturn ASM loader which makes it possible to use ARM ASM at all probably takes a bit of that), 0.238 seconds for n=1000, 17.0284 seconds for n=10000.
The code:
CODE
A=0.W
GOSBVL POP#
GOSBVL SAVPTR
SKUB {
*start
!ARM
STMDB sp! {R4 R5 R6 LP}
LDR R2,[R1,#2316] ;Saturn A register
MOV R3,3
*outer
MOV R4,2
*inner
MOVS R5,R4 LSL #16
MOVCS R5.R4
MOVS R6,R5 LSL #8
MOVCS R6,R5
MOVS R5,R6 LSL #4
MOVCS R5,R6
MOVS R6,R5 LSL #2
MOVCSS R6,R5
MOVPL R6,R6 LSL #1
MOV R5,R3
*modloop
CMP R5,R6
BEQ outer_end
SUBHS R5,R5,R6
MOV R6,R6 LSR #1
CMP R6,R4
BHS modloop
ADD R4,R4,1
CMP R4,R3
BLO inner
*outer_end
ADD R3,R3,1
CMP R3,R2
BLS outer
LDMIA sp! {R4 R5 R6 PC}
!ASM
*end
}
C=RSTK
D0=C
D1=80100
LC(5)end-start
MOVEDN
LC 80100
ARMSAT
GOVLNG GETPTRLOOP
ENDCODE

If you don't trust me on the 593M result, try it yourself! I will gladly help you if you have trouble with the compiling tools. (Mostly just 256. ATTACH, then type the code into a string, add a newline and @ at the end, then do ASM EVAL and press ON after a minute. For the ultra-naive primes, you have to do this instead of EVAL: input n to the stack, and execute R~SB SWAP TEVAL. The ON key will not have any effect in this one.)
  • pier4r likes this

#35 pier4r

pier4r

    Casio Fan

  • Members
  • PipPip
  • 39 posts

Posted 13 September 2013 - 05:35 PM

3298, i trust you! Until the results are consistent (see before with saturn ASM) i have no doubt. The only thing can be an error in the code, and stop :) but the summation is really simple and moreover the ARM asm, for me, is more readable than saturn one. (I have done some ASM x86)

edit: about "Still, the results are not bad: 0.0121 seconds for n=100 (the Saturn ASM loader which makes it possible to use ARM ASM at all probably takes a bit of that), 0.238 seconds for n=1000, 17.0284 seconds for n=10000."
If the ARM ASM is so fast, and the results for naive primes are consistent, then i have a doubt. Can, your designed division, be enough slow to slow down the code? The prizm, with C, has done almost 10 seconds with n=10'000 but it is way slower in he summation test.

(I'm reporting the results, still wait a bit!)

Edited by pier4r, 13 September 2013 - 05:40 PM.


#36 pier4r

pier4r

    Casio Fan

  • Members
  • PipPip
  • 39 posts

Posted 13 September 2013 - 06:16 PM

Added! Thanks a lot. Now the last "fast" one of the casio is the 9860. Obviously other result will be appreciated, like classpads and previous programmable calculators.

And not much about 593M, but even the ARM ASM has won the Nokia e5 and Palm treo smartphones! :) (naiveprimes)

edit: no is not ended! You can do also the savage test (maybe in ARM ASM is hard to do log/atan and similar functions), at least with Lua and PRIZM C SDK.

Edited by pier4r, 13 September 2013 - 06:18 PM.


#37 flyingfisch

flyingfisch

    Casio Maniac

  • Deputy
  • PipPipPipPipPipPipPipPip
  • 1891 posts
  • Gender:Male
  • Location:OH,USA
  • Interests:Aviation, Skiing, Programming, Mountain Biking.

  • Calculators:
    fx-9860GII
    fx-CG10 PRIZM

Posted 13 September 2013 - 07:10 PM

Do you plan on graphing the results some time?

#38 pier4r

pier4r

    Casio Fan

  • Members
  • PipPip
  • 39 posts

Posted 13 September 2013 - 07:15 PM

Sure, but the graph should be logarithmic i think. (Even if, actually, i prefer to gather the results and update the page on the wiki "best of discussions")

Of course you can do that if you want (the results are public ) , i just link your blog post on the wiki :)

Edited by pier4r, 13 September 2013 - 07:15 PM.


#39 flyingfisch

flyingfisch

    Casio Maniac

  • Deputy
  • PipPipPipPipPipPipPipPip
  • 1891 posts
  • Gender:Male
  • Location:OH,USA
  • Interests:Aviation, Skiing, Programming, Mountain Biking.

  • Calculators:
    fx-9860GII
    fx-CG10 PRIZM

Posted 13 September 2013 - 07:33 PM

Sure, but the graph should be logarithmic i think. (Even if, actually, i prefer to gather the results and update the page on the wiki "best of discussions")

Ok, i was just thinking it would be nice to visually see how fast each of the calcs runs the benchmarks.

Of course you can do that if you want (the results are public ) , i just link your blog post on the wiki :)


Oh thanks :)

#40 3298

3298

    Casio Addict

  • Members
  • PipPipPip
  • 79 posts
  • Gender:Male
  • Location:Germany

  • Calculators:
    fx-9750G Plus
    Algebra FX 2.0 (ROM 1.03,broken)
    HP 50G

Posted 13 September 2013 - 09:07 PM

I just realized my directions about confirming my result were incomplete - installing a library called extable (available on www.hpcalc.org, as usual) is also necessary.
After toying around with the addloop benchmark, I found out that my optimization does really help a lot. After throwing it out (delete the lines that say "TST R2,$F" and "BNE loop") I just got 158998232. And when I increased the number of additions between the ON key checks from 16 to 256 (by replacing $F with $FF), I even got 728608512. The keyboard was still responsive enough, it "only" checked the ON key about 47000 times per second. I can't increase it further due to the ARM's limitation for immediate numbers - a group of 8 bits rotated by 0, 2, ..., 30 positions.
The modulo operation ... can be optimized, I have some ideas, just let me figure out a working solution.
  • pier4r likes this




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users