Optimizing Livermore loops

This page contains optimization results of dco version 1.0.1 achieved on the Livermore loops benchmark while optimizing code generated by gcc version 4.1.2.

Preparing the benchmarks

We used the C version of the Livermore loops benchmark. The code was modified to eliminate calibration, thus ensuring that on every run the same number of iterations are executed on the same input data. This makes it possible to compare the execution times of the program ( and not the estimate amount of MFlops as in the original implementation ).

Read this to understand how the benchmarks were executed.

Results of optimization

The following table lists the improvements of the dco optimized code over the code generated by gcc. The complete data, collected during the benchmarking, is presented here. Some cases have links pointing to in-depth explanation of the benchmark and results of it optimization.

For the convenience, cases were

there is no improvement ( the execution time difference is in the range from -5% to 5% ) are marked in this color
dco generated code is slower ( by more than 5% ) than compiler generated code are marked in this color
dco generated code is moderately faster than compiler generated code ( from 5% to 10% ) are marked in this color
dco generated code is faster than compiler generated code ( from 10% to 20% ) are marked in this color
dco generated code is much faster than compiler generated code ( 20% or more ) are marked in this color

Livermore loops

Kernel 1	33%
Kernel 2	3%
Kernel 3	40%
Kernel 4	18%
Kernel 5	60%
Kernel 6	20%
Kernel 7	36%
Kernel 8	22%
Kernel 9	14%
Kernel 10	32%
Kernel 11	74%
Kernel 12	15%
Kernel 13	0%
Kernel 14	10%
Kernel 15	0%
Kernel 16	6%
Kernel 17	0%
Kernel 18	16%
Kernel 19	29%
Kernel 20	2%
Kernel 21	6%
Kernel 22	0%
Kernel 23	2%
Kernel 24	84%

Considering results of optimization in the range from -5% to 5% to be "the same" and results outside of this range to be better/worse, the data listed above can be summarized as following:

dco improved the gcc-generated code	17 out of 24 times - 71% cases
dco didn't affect the performance of the gcc-generated code	7 out of 24 times - 29% cases

with the average improvement of 27%.

Timing of the Livermore loops kernels

The following table presents the execution data collected while performing benchmarking of the Livermore loops - see this for description of the code that was benchmarked.

The two columns under gcc and gcc+dco headers present execution times ( in seconds ) and execution speeds ( in MFlops ) achieved by the compiler generated code and dco optimized code respectively. The column under the gcc+dco/gcc header lists the improvement achieved by utilizing dco over the compiler generated code. For example, the compiler generated code executed kernel #1 in 4.96 second and achieved speed of 1707.57 MFlops; after optimization by dco the resulting code run for 3.32 second delivering 2551.83 MFlops which is 33.06% improvement.

Kernel#	gcc 4.1.2		gcc+dco		gcc+dco/gcc
1	4.96	1707.57	3.32	2551.83	33.06%
2	2.38	1462.97	2.32	1500.79	2.52%
3	5.93	1136.8	3.55	1898.05	40.13%
4	4.66	1239.76	3.8	1519.63	18.45%
5	5.2	183.98	2.07	462.66	60.19%
6	4.53	718.38	3.63	896.31	19.87%
7	4.87	1677.7	3.12	2619.9	35.93%
8	5	991.29	3.88	1277.1	22.40%
9	4.6	1540.48	3.95	1794.09	14.13%
10	4.94	353.19	3.38	516.43	31.58%
11	5.78	96.75	1.52	368.67	73.70%
12	5.18	681.16	4.39	803.66	15.25%
13	4.57	198.85	4.58	198.42	-0.22%
14	4.71	256.27	4.26	283.34	9.55%
15	3.72	586.53	3.72	586.53	0.00%
16	5.61	698.71	5.29	740.95	5.70%
17	5.01	593.81	4.99	596.19	0.40%
18	4.7	826.95	3.95	984.03	15.96%
19	5.81	372.45	4.1	527.69	29.43%
20	4.53	374.26	4.43	382.71	2.21%
21	4.88	362.2	4.61	383.41	5.53%
22	4.88	191.36	4.86	192.14	0.41%
23	4.17	681.74	4.09	695.08	1.92%
24	4.85	132.3	0.77	837.82	84.12%
Geometric Mean	4.75		3.45		27.28%