This page contains dco's
optimization results for the Livermore
loops benchmark while optimizing code generated by the gcc version
4.2.2 on x86-64 and IA-32
systems. See this for
optimization results of
the previous version of dco
( version 1.0.1 ) on the same benchmark for the
code
generated by gcc
version 4.1.2.
Preparing the benchmarks
We used the C version of the
Livermore loops benchmark. The code was
modified to eliminate calibration, thus ensuring that on every run the
same number of iterations are executed on the same input data. This
makes it possible to compare the execution times of the program ( and
not the estimate amount of MFlops as in the original implementation ).
For every kernel of the
benchmark, dco
was invoked twice: first without any
options ( default mode ) and then with the -no-packing option;
the
best execution time is reported - note that the x86
assembly source that was optimized
is one generated when gcc
was
invoked.
Read this
to understand how the benchmarks were executed and code
optimization results were calculated.
Timing
of the Livermore loops kernels and Results of optimization
The following tables presents the
execution data collected while
performing benchmarking of the Livermore loops.
The two columns under gcc and gcc+dco
headers present execution times ( in seconds ) achieved by the compiler
generated code and dco
optimized code
respectively. The column under the gcc+dco/gcc
header lists the
improvement achieved by utilizing dco over
the compiler generated code. For example, the compiler generated code
executed kernel #1 in 3.06 seconds;
after optimization by dco
the
resulting code run for 2.22 seconds
which is 27.45%
improvement.
results for x86-64 64-bit code
The following are the results of
optimizations achieved on 64-bit Linux operating system
running on the 2.66GHz Core2 computer.
Thegcc version
4.2.2 compiler, used to process
the benchmarks, was invoked with the
following command line options:
-S -O3
-fomit-frame-pointer -funroll-all-loops-ffast-math -march=nocona
-mfpmath=sse -msse3
The dco version 1.1.0 was used
to optimize compiler generated code.
Kernel# |
gcc 4.2.2 |
gcc+dco |
gcc+dco/gcc |
1 |
3.06 |
2.22 |
27.45% |
2 |
2.21 |
2.01 |
9.05% |
3 |
3.82 |
1.46 |
61.78% |
4 |
3.15 |
2.04 |
35.24% |
5 |
2.52 |
1.45 |
42.46% |
6 |
5.78 |
2.66 |
53.98% |
7 |
2.6 |
1.71 |
34.23% |
8 |
1.72 |
1.41 |
18.02% |
9 |
2.21 |
1.89 |
14.48% |
10 |
1.8 |
1.41 |
21.67% |
11 |
1.89 |
0.64 |
66.14% |
12 |
2.99 |
3 |
-0.33% |
13 |
1.46 |
1.47 |
-0.68% |
14 |
1.53 |
1.46 |
4.58% |
15 |
1.98 |
1.99 |
-0.51% |
16 |
3.11 |
2.91 |
6.43% |
17 |
2.8 |
2.5 |
10.71% |
18 |
2.8 |
2.38 |
15% |
19 |
3.9 |
3.21 |
17.69% |
20 |
2.78 |
2.68 |
3.6% |
21 |
2.2 |
1.88 |
14.55% |
22 |
2.58 |
2.52 |
2.33% |
23 |
2.18 |
2.17 |
0.46% |
24 |
2.05 |
0.63 |
69.27% |
Geometric
Mean |
2.5 |
1.85 |
25.96% |
On the average dco
achieved improvement of 26%
over the 64-bit code generated and optimized by the gcc
version 4.2.2.
results for IA-32 32-bit code
The following are the results of optimizations achieved on 32-bit Linux
operating system
running on the 2.8GHz Pentium4 computer.
The gcc
version
4.2.2 compiler, used to process
the benchmarks, was invoked with the
following command line options:
-S -O3
-fomit-frame-pointer -funroll-all-loops-ffast-math -march=pentium4
-mfpmath=sse -msse2
The dco version 1.1.1
was used to optimize compiler generated code. Note
that dco's -32 command line
option was used during optimization.
Kernel# |
gcc 4.2.2 |
gcc+dco |
gcc+dco/gcc |
1 |
5.03 |
3.24 |
35.59% |
2 |
2.46 |
2.36 |
4.07% |
3 |
4.99 |
2.49 |
50.1% |
4 |
5.04 |
3.84 |
23.81% |
5 |
5.33 |
1.76 |
66.98% |
6 |
16.24 |
5.06 |
68.84% |
7 |
5.16 |
4.16 |
19.38% |
8 |
3.87 |
3.91 |
-1.03% |
9 |
4.95 |
4.01 |
18.99% |
10 |
4.94 |
3.38 |
31.58% |
11 |
4.93 |
0.85 |
82.76% |
12 |
5.02 |
5.2 |
-3.59% |
13 |
4.63 |
4.66 |
-0.65% |
14 |
4.41 |
4.23 |
4.08% |
15 |
5.44 |
4.47 |
17.83% |
16 |
4.86 |
4.52 |
7.% |
17 |
4.87 |
4.17 |
14.37% |
18 |
4.57 |
3.61 |
21.01% |
19 |
5.82 |
4.1 |
29.55% |
20 |
4.53 |
4.43 |
2.21% |
21 |
7.16 |
8.85 |
-23.6% |
22 |
4.79 |
4.8 |
-0.21% |
23 |
3.67 |
2.82 |
23.16% |
24 |
4.85 |
0.84 |
82.68% |
Geometric
Mean |
5.01 |
3.43 |
31.59% |
On the average dco
achieved improvement of 32%
over the 32-bit code generated and optimized by the gcc
version 4.2.2.