world leader in high performance signal processing
Trace: » dhrystone

Dhrystone

The Dhrystone benchmark was designed to test performance factors important in non numeric systems programming (operating systems, compilers, wordprocessors, etc.):

  • it contains no floating point operations;
  • a considerable percentage of time is spent in string functions making the test very dependent upon the way such operations are performed (e.g. by in-line code, routines written in assembly language, etc.) making it susceptible to manufacturers 'tweaking' of critical routines;
  • it contains hardly any tight loops so in the case of very small caches the majority of instruction accesses are will be misses; however, the situation changes radically as soon as the cache reaches a critical size and can hold the main measurement loop;
  • only a small amount of global data is manipulated (as opposed to Whetstone).

There are two versions of the Dhrystone benchmark. A depreated version 1.1 contained some 'dead code' which could be removed by optimising compilers. Version 2.1 corrected this and should be the version used in practice (and is the one that is in the uClinux distribution). Some manufacturers, however, still quote the (better) results of Version 1.1 so care must be taken when comparing Dhrystone performance figures to check which version was used.

If Dhrystone data is used for comparison purposes, it is important that the conditions of the benchmark are well understood, including:

  1. Which Dhrystone version was used? (We use Dhrystone version 2.1)
  2. Which Dhrystone source code was used (ANSI, unmodified K&R)? (we used the unmodified Unmodified K&R)
  3. How many compilation modules were used (one merged or two separate modules)? (We use the original unmodified two separate modules)
  4. Which inlining settings are applied? (See results)
  5. Which C libraries are used? (We use uClibc included in the toolchain)
  6. Which tool chain (compiler version, options, linker) was used? (We use gcc 4.x)
  7. What is the CPU’s clock speed? (See results)

These questions were borrowed from a White paper Richard York wrote.

Some people still occasionally ask for Dhrystone MIPS data, and so we will continue to provide these numbers. However, we do not recommend that Dhrystone results be used as part of any embedded processor evaluation exercise, due to its many known deficiencies. Where we provides a Dhrystone figure, it will be based on its standard tool suite under the conditions outlined above, and will therefore be 100% publicly and independently reproducible.

Results on a Blackfin

These results were taken on the processor, with Drystone compiled as a Linux application.

Compiler Version

rgetz@pinky:~/blackfin1/uClinux-dist/user/dhrystone> bfin-uclinux-gcc --version
bfin-uclinux-gcc (ADI-trunk/svn-3648) 4.3.4
Copyright (C) 2008 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Commands to compile

rgetz@pinky:~/blackfin/uclinux-dist/user/dhrystone> bfin-linux-uclibc-gcc -O3 -DNO_PROTOTYPES=1 -c -o dhry_1.o dhry_1.c
rgetz@pinky:~/blackfin/uclinux-dist/user/dhrystone> bfin-linux-uclibc-gcc -O3 -DNO_PROTOTYPES=1 -c -o dhry_2.o dhry_2.c
rgetz@pinky:~/blackfin/uclinux-dist/user/dhrystone> bfin-linux-uclibc-gcc -O3 -DNO_PROTOTYPES=1  dhry_1.o dhry_2.o -o  dhrystone
rgetz@pinky:~/blackfin/uclinux-dist/user/dhrystone> rcp ./dhrystone root@192.168.0.8:/dhrystone

Processor Settings

root:/> cat /proc/cpuinfo
processor       : 0
vendor_id       : Analog Devices
cpu family      : 0x27c8000
model name      : ADSP-BF537 600(MHz CCLK) 120(MHz SCLK)
stepping        : 2
cpu MHz         : 600.000/120.000
bogomips        : 1196.03
Calibration     : 598016000 loops
cache size      : 16 KB(L1 icache) 32 KB(L1 dcache-wb) 0 KB(L2 cache)
dbank-A/B       : cache/cache
icache setup    : 4 Sub-banks/4 Ways, 32 Lines/Way
dcache setup    : 2 Super-banks/4 Sub-banks/2 Ways, 64 Lines/Way
board name      : ADDS-BF537-STAMP
board memory    : 65536 kB (0x00000000 -> 0x04000000)
kernel memory   : 57336 kB (0x00001000 -> 0x037ff000)

Typical Output

root:~> ./dhrystone_O0

Dhrystone Benchmark, Version 2.1 (Language: C)

Program compiled without 'register' attribute

Please give the number of runs through the benchmark: 10000000

Execution starts, 10000000 runs through Dhrystone
Execution ends

Final values of the variables used in the benchmark:

Int_Glob:            5
        should be:   5
Bool_Glob:           1
        should be:   1
Ch_1_Glob:           A
        should be:   A
Ch_2_Glob:           B
        should be:   B
Arr_1_Glob[8]:       7
        should be:   7
Arr_2_Glob[8][7]:    10000010
        should be:   Number_Of_Runs + 10
Ptr_Glob->
  Ptr_Comp:          3707284
        should be:   (implementation-dependent)
  Discr:             0
        should be:   0
  Enum_Comp:         2
        should be:   2
  Int_Comp:          17
        should be:   17
  Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
        should be:   DHRYSTONE PROGRAM, SOME STRING
Next_Ptr_Glob->
  Ptr_Comp:          3707284
        should be:   (implementation-dependent), same as above
  Discr:             0
        should be:   0
  Enum_Comp:         1
        should be:   1
  Int_Comp:          18
        should be:   18
  Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
        should be:   DHRYSTONE PROGRAM, SOME STRING
Int_1_Loc:           5
        should be:   5
Int_2_Loc:           13
        should be:   13
Int_3_Loc:           7
        should be:   7
Enum_Loc:            1
        should be:   1
Str_1_Loc:           DHRYSTONE PROGRAM, 1'ST STRING
        should be:   DHRYSTONE PROGRAM, 1'ST STRING
Str_2_Loc:           DHRYSTONE PROGRAM, 2'ND STRING
        should be:   DHRYSTONE PROGRAM, 2'ND STRING

Microseconds for one run through Dhrystone:    3.0
Dhrystones per Second:                      336021.5

Summary

All these tests were taken with the same processor (BF537 0.2, running at 500MHz CCLK, 120 MHz SCLK), just varying the compiler optimization settings. 10000000 (Ten Million) iterations are used to obtain accurate results.

Flags 1) size (bytes) md5sum Loops Dhrystones per Second 2) Dhrystone MIPS 3) DMIPS/MHz
-Os 35364 af6bdc6aa9887ba3d06eabf41d48ab1a 10000000 398089.2 226.573 0.453146
-Os -funsafe-loop-optimizations 35364 ca63e189556a518daa56d392aa14cfc7 10000000 398089.2 226.573 0.453146
-Os -funroll-loops 35364 20488fe3ab43751bc61ff8b05d3dc703 10000000 398724.1 226.935 0.453869
-Os -funroll-loops -funsafe-loop-optimizations 35364 4cf225e93d70bccf611c3171a2fb216e 10000000 398724.1 226.935 0.453869
-Os -ffast-math 35364 b9215c8d6889de04f3650bb2b532340b 10000000 398089.2 226.573 0.453146
-Os -ffast-math -funsafe-loop-optimizations 35364 3b495a4b4c60f627f203085b647b040d 10000000 398089.2 226.573 0.453146
-Os -ffast-math -funroll-loops 35364 b6a963058c34ef89836f90fc0d154298 10000000 398724.1 226.935 0.453869
-Os -ffast-math -funroll-loops -funsafe-loop-optimizations 35364 84e07215483de58f61eff7378eb34bea 10000000 398724.1 226.935 0.453869
-Os -fomit-frame-pointer 35324 070dac6558ef149ab191999559dde16c 10000000 419639.1 238.838 0.477677
-Os -fomit-frame-pointer -funsafe-loop-optimizations 35324 ee50be758aa32d44fc08f2797e32e22e 10000000 419639.1 238.838 0.477677
-Os -fomit-frame-pointer -funroll-loops 35324 3a526a30f33ea456f28fcaa9de565f3d 10000000 420344.7 239.24 0.47848
-Os -fomit-frame-pointer -funroll-loops -funsafe-loop-optimizations 35324 4b26618cd384c38af8c4ae50f3486b63 10000000 420344.7 239.24 0.47848
-Os -fomit-frame-pointer -ffast-math 35324 383be4fcab72ef3be9b3f3af07e3f456 10000000 419463.1 238.738 0.477476
-Os -fomit-frame-pointer -ffast-math -funsafe-loop-optimizations 35324 f596b3dbd664f79727741476deeed312 10000000 419639.1 238.838 0.477677
-Os -fomit-frame-pointer -ffast-math -funroll-loops 35324 03522a57744a4a78c163948d5c8ded63 10000000 420344.7 239.24 0.47848
-Os -fomit-frame-pointer -ffast-math -funroll-loops -funsafe-loop-optimizations 35324 09e7a669e85b9899cd8705e50b2aa1cc 10000000 420168.1 239.139 0.478279
-O0 36448 6e640730c45f3e5eb08a4b5f1e983718 10000000 358680.1 204.143 0.408287
-O0 -funsafe-loop-optimizations 36448 596a9b3d200f3047e7a06d8b1d684390 10000000 358680.1 204.143 0.408287
-O0 -funroll-loops 36448 4a06206b0bb3c416df301bdafea2c228 10000000 358680.1 204.143 0.408287
-O0 -funroll-loops -funsafe-loop-optimizations 36448 b3b0d9d9a6f944d58ed1572a950cc92e 10000000 358680.1 204.143 0.408287
-O0 -ffast-math 36448 6a2cfbc971f7ec453812eedde46976a4 10000000 358680.1 204.143 0.408287
-O0 -ffast-math -funsafe-loop-optimizations 36448 28a44f6847d621979bf24cfad01dcc36 10000000 358680.1 204.143 0.408287
-O0 -ffast-math -funroll-loops 36448 beb5cbb31f05b3cf0139620a65e849fe 10000000 358680.1 204.143 0.408287
-O0 -ffast-math -funroll-loops -funsafe-loop-optimizations 36448 2a455e7b1d6d0e81847fe54bd144a950 10000000 358680.1 204.143 0.408287
-O0 -fomit-frame-pointer 36544 1ae7b3a40f02aa8b8cb05cb1e11dc4f4 10000000 374953.1 213.405 0.426811
-O0 -fomit-frame-pointer -funsafe-loop-optimizations 36544 8bca489799d5ab25963a451ec06cb02b 10000000 374953.1 213.405 0.426811
-O0 -fomit-frame-pointer -funroll-loops 36544 2df7cb97d2af09a9559c15ced3e41d13 10000000 375093.8 213.485 0.426971
-O0 -fomit-frame-pointer -funroll-loops -funsafe-loop-optimizations 36544 1bee72b38041fd6569ffed03f626322e 10000000 374953.1 213.405 0.426811
-O0 -fomit-frame-pointer -ffast-math 36544 c5cc07a32e481df50dac7b0629af3dd9 10000000 374953.1 213.405 0.426811
-O0 -fomit-frame-pointer -ffast-math -funsafe-loop-optimizations 36544 3ea72f97af19412a9e9ae43247769556 10000000 374812.6 213.325 0.426651
-O0 -fomit-frame-pointer -ffast-math -funroll-loops 36544 d04d977b26c118b6b56959f6ee675616 10000000 374953.1 213.405 0.426811
-O0 -fomit-frame-pointer -ffast-math -funroll-loops -funsafe-loop-optimizations 36544 057f93bbb54903604cbbbee318143dcf 10000000 374953.1 213.405 0.426811
-O1 35372 555aa7557093dd446ca5c334bd616222 10000000 527148.1 300.027 0.600055
-O1 -funsafe-loop-optimizations 35372 ceec41ae53ae6c37be754282ec83f162 10000000 535618.6 304.848 0.609697
-O1 -funroll-loops 35372 d4b4de7851396208a948ce6ec3ad0e51 10000000 519210.8 295.51 0.59102
-O1 -funroll-loops -funsafe-loop-optimizations 35372 86db4200d3176d4a95dc0169c6612aeb 10000000 528820.8 300.979 0.601959
-O1 -ffast-math 35372 b65b58c929754c113386fa49f8c90fe4 10000000 527148.1 300.027 0.600055
-O1 -ffast-math -funsafe-loop-optimizations 35372 d06a19db3c50e8b28e06286502ac014c 10000000 535905.7 305.012 0.610024
-O1 -ffast-math -funroll-loops 35372 233b5e4afd7b5ae659769561c8df20ce 10000000 519480.5 295.663 0.591327
-O1 -ffast-math -funroll-loops -funsafe-loop-optimizations 35372 4fc45e7d91855cfe897a88ff1e6ea37f 10000000 528820.8 300.979 0.601959
-O1 -fomit-frame-pointer 35356 72f8b7ef8a88753f175f8e52a648c6e4 10000000 579038.8 329.561 0.659122
-O1 -fomit-frame-pointer -funsafe-loop-optimizations 35356 9ff9634782c6d74ae7e21c23295834b6 10000000 592066.3 336.976 0.673951
-O1 -fomit-frame-pointer -funroll-loops 35356 ffae4167b98dacbd1cacef2dab5bde6a 10000000 574712.6 327.099 0.654198
-O1 -fomit-frame-pointer -funroll-loops -funsafe-loop-optimizations 35356 70ebd3f98429e7b327fdf6820dff7c3d 10000000 583771.2 332.255 0.664509
-O1 -fomit-frame-pointer -ffast-math 35356 4774fcd7727a9f0c287f1ff861ccb6bc 10000000 579038.8 329.561 0.659122
-O1 -fomit-frame-pointer -ffast-math -funsafe-loop-optimizations 35356 246e46ea2c8b6b0ad6f62c2c00a40df2 10000000 592066.3 336.976 0.673951
-O1 -fomit-frame-pointer -ffast-math -funroll-loops 35356 eafea6b0f13285e3fb4f3c81c3f1081b 10000000 574712.6 327.099 0.654198
-O1 -fomit-frame-pointer -ffast-math -funroll-loops -funsafe-loop-optimizations 35356 e592fca7c8c0ae58bc506e7255508b9f 10000000 583430.6 332.061 0.664121
-O2 35852 eacb2e445e955baee80ba0f31e376eca 10000000 697350.1 396.898 0.793796
-O2 -funsafe-loop-optimizations 35852 81958b4fc25675f57d6e9680308f0c33 10000000 697836.7 397.175 0.79435
-O2 -funroll-loops 35852 201f30a46cb2cb8407a30a682e901352 10000000 700770.9 398.845 0.79769
-O2 -funroll-loops -funsafe-loop-optimizations 35852 fc379475d6dc6c428a74b690f39e39d9 10000000 700770.9 398.845 0.79769
-O2 -ffast-math 35852 d0abae396fa85853deacf830528f0b1c 10000000 697350.1 396.898 0.793796
-O2 -ffast-math -funsafe-loop-optimizations 35852 b9dfbbe3d047acc972ae05b87454d50d 10000000 697350.1 396.898 0.793796
-O2 -ffast-math -funroll-loops 35852 3d3fd64f295d1f08a861d0b5f98e54d6 10000000 700770.9 398.845 0.79769
-O2 -ffast-math -funroll-loops -funsafe-loop-optimizations 35852 f62ce4fa41e9caab7850f708e4ff2a1c 10000000 700770.9 398.845 0.79769
-O2 -fomit-frame-pointer 35820 23efc0484eb1412c8d03e36da9d58d7d 10000000 758725.3 431.83 0.86366
-O2 -fomit-frame-pointer -funsafe-loop-optimizations 35820 dcd5ffba3cec11d52a6ef38a6c680e4f 10000000 758725.3 431.83 0.86366
-O2 -fomit-frame-pointer -funroll-loops 35820 eae900d8703ab793b50fb10ae8f55393 10000000 761035.0 433.145 0.866289
-O2 -fomit-frame-pointer -funroll-loops -funsafe-loop-optimizations 35820 75044b285c6460a558b82c2aa8eba1a4 10000000 761035.0 433.145 0.866289
-O2 -fomit-frame-pointer -ffast-math 35820 e16a62e362fcee6bab43da2145e86012 10000000 758150.1 431.503 0.863005
-O2 -fomit-frame-pointer -ffast-math -funsafe-loop-optimizations 35820 fab2d89108e47660b594a2c0707ae9da 10000000 758725.3 431.83 0.86366
-O2 -fomit-frame-pointer -ffast-math -funroll-loops 35820 0449cc215b65e44717cfd404bb3be632 10000000 761035.0 433.145 0.866289
-O2 -fomit-frame-pointer -ffast-math -funroll-loops -funsafe-loop-optimizations 35820 9cd77ce165a92d6aa20c9d993efdd61e 10000000 760456.3 432.815 0.86563
-O3 35852 fdb28a2b7ba7143587455c275fde7b66 10000000 697836.7 397.175 0.79435
-O3 -funsafe-loop-optimizations 35852 cf616e1bfaa54d658f19720d5e947619 10000000 697350.1 396.898 0.793796
-O3 -funroll-loops 35852 c5c4c50ae1306ee7cf78461f30129c4d 10000000 700280.1 398.566 0.797132
-O3 -funroll-loops -funsafe-loop-optimizations 35852 8f5a156351237ed7b0d0ab0d1c0729d7 10000000 700280.1 398.566 0.797132
-O3 -ffast-math 35852 61353a84e596e70926ede018e70302b2 10000000 697836.7 397.175 0.79435
-O3 -ffast-math -funsafe-loop-optimizations 35852 ddae039f1838554f7211ce265c6f2953 10000000 697350.1 396.898 0.793796
-O3 -ffast-math -funroll-loops 35852 f4b04e0eb021e12d6942ae56234cf11f 10000000 700770.9 398.845 0.79769
-O3 -ffast-math -funroll-loops -funsafe-loop-optimizations 35852 c335c472b522736234ccf304e5f43e24 10000000 700280.1 398.566 0.797132
-O3 -fomit-frame-pointer 35820 8e7ec6fa10cc4d2dfb11595c1237296c 10000000 758150.1 431.503 0.863005
-O3 -fomit-frame-pointer -funsafe-loop-optimizations 35820 41737019865627c2af5c084fc986b2ae 10000000 758150.1 431.503 0.863005
-O3 -fomit-frame-pointer -funroll-loops 35820 843026d94f073928fa1793d168520c6d 10000000 761035.0 433.145 0.866289
-O3 -fomit-frame-pointer -funroll-loops -funsafe-loop-optimizations 35820 b871965048f5dc59e0e409f179fc9c9e 10000000 760456.3 432.815 0.86563
-O3 -fomit-frame-pointer -ffast-math 35820 9df7519ad3b51e9b595b86e0e4977a50 10000000 758725.3 431.83 0.86366
-O3 -fomit-frame-pointer -ffast-math -funsafe-loop-optimizations 35820 0b789139f787960636c604b27ae79ac0 10000000 758725.3 431.83 0.86366
-O3 -fomit-frame-pointer -ffast-math -funroll-loops 35820 869afcd1055dfa34e076707238769168 10000000 761035.0 433.145 0.866289
-O3 -fomit-frame-pointer -ffast-math -funroll-loops -funsafe-loop-optimizations 35820 362ba1f9f25411fa8afc8d1e35078f7a 10000000 760456.3 432.815 0.86563

The -O3 option includes function inlining which is considered “illegal” with the Dhrystone benchmark. There are a specific set of rules to Dhrystone, defined by EEMBC, so the -O3 results should be taken for informational purposes of what advantage -O3 can give, not for published results of DMIPS.

Analysis

“Benchmarking without analysis is as useless as analysis without benchmarking.” - Richard P. Gabriel, Performance and Evaluation of Lisp Systems, 1985

We can see that applications, like Dhrystone, which fit completely in cache, are only effected by a little bit by changing SCLK rates.

Before we can look at detailed analysis, some basic gcc flags must be understood:

-static
On systems that support dynamic linking, this prevents linking with the shared libraries, and includes the library functions in the application.
-O0
Do not optimize.
-Os
Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size. -Os disables the following optimization flags: -falign-functions -falign-jumps -falign-loops -falign-labels -freorder-blocks -freorder-blocks-and-partition -fprefetch-loop-arrays -ftree-vect-loop-version
-O1
Turns on the following optimization flags: -fdefer-pop -fdelayed-branch -fguess-branch-probability -fcprop-registers -floop-optimize -fif-conversion -fif-conversion2 -ftree-ccp -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-ter -ftree-lrs -ftree-sra -ftree-copyrename -ftree-fre -ftree-ch -funit-at-a-time -fmerge-constants
-O2
turns on all optimization flags specified by -O1. It also turns on the following optimization flags: -fthread-jumps -fcrossjumping -foptimize-sibling-calls -fcse-follow-jumps -fcse-skip-blocks -fgcse -fgcse-lm -fexpensive-optimizations-fstrength-reduce -frerun-cse-after-loop -frerun-loop-opt -fcaller-saves -fpeephole2 -fschedule-insns -fschedule-insns2 -fsched-interblock -fsched-spec -fregmove -fstrict-aliasing -fdelete-null-pointer-checks -freorder-blocks -freorder-functions -falign-functions -falign-jumps -falign-loops -falign-labels -ftree-vrp -ftree-pre
-O3
Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops and -fgcse-after-reload options.
-fomit-frame-pointer
Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions.
-ffunction-sections
-fdata-sections
Place each function or data item into its own section in the output file if the target supports arbitrary sections. The name of the function or the name of the data item determines the section's name in the output file.
-gc-sections
Linker Flag to enable garbage collection of unused input sections. Used in combination with -ffunction-sections and -fdata-sections can make applications smaller.

Profile

Over 99.7% of the processor is spent running dhrystone.

Total CPU Time Application Function
46.44% dhrystone _main
12.98% libm-0.9.29.so ___udivsi3
9.08% libuClibc-0.9.29.so _strcmp
8.24% dhrystone _Proc_8
7.51% dhrystone _Func_2
6.23% dhrystone _Proc_7
3.43% dhrystone _Func_1
2.25% dhrystone ___divsi3
2.21% dhrystone _Proc_6
1.40% dhrystone __init
1) standard CFLAGS include -pipe -Wall -g -mcpu=bf537-0.2 -DNO_PROTOTYPES=1
2) Based on bfin-uclinux-gcc
3) DMIPS is obtained when the Dhrystone score is divided by 1,757 (the number of Dhrystones per second obtained on the VAX 11/780, nominally a 1 MIPS machine