Four and a half months ago I posted an article about mixbench benchmark. This benchmark was used to assess performance of an artificial kernel with mixed compute and memory operations which corresponds to various operational intensities (Flops/byte ratios). The implementation was based on CUDA and therefore only NVidia GPUs could be used.
Now, I've ported the CUDA implementation to OpenCL and here I provide some performance numbers on an AMD R7-260X. Here is the output when using 128MB memory buffer:
mixbench-ocl (compute & memory balancing GPU microbenchmark) Use "-h" argument to see available options ------------------------ Device specifications ------------------------ Device: Bonaire Driver version: 1800.11 (VM) GPU clock rate: 1175 MHz Total global mem: 1871 MB Max allowed buffer: 1336 MB OpenCL version: OpenCL 2.0 AMD-APP (1800.11) Total CUs: 14 ----------------------------------------------------------------------- Buffer size: 128MB Workgroup size: 256 Workitem stride: NDRange Loading kernel source file... Precompilation of kernels... [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] --------------------------------------------------- CSV data -------------------------------------------------- Single Precision ops,,,, Double precision ops,,,, Integer operations,,, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Iops/byte, ex.time, GIOPS, GB/sec 0.000, 273.95, 0.00, 62.71, 0.000, 519.39, 0.00, 66.15, 0.000, 258.30, 0.00, 66.51 0.065, 252.12, 4.26, 66.01, 0.032, 506.86, 2.12, 65.67, 0.065, 252.08, 4.26, 66.02 0.133, 241.49, 8.89, 66.69, 0.067, 487.11, 4.41, 66.13, 0.133, 241.59, 8.89, 66.67 0.207, 235.72, 13.67, 66.05, 0.103, 474.25, 6.79, 65.66, 0.207, 236.35, 13.63, 65.87 0.286, 225.46, 19.05, 66.67, 0.143, 453.92, 9.46, 66.23, 0.286, 225.05, 19.08, 66.80 0.370, 219.59, 24.45, 66.01, 0.185, 442.80, 12.12, 65.47, 0.370, 220.15, 24.39, 65.84 0.462, 209.03, 30.82, 66.78, 0.231, 421.14, 15.30, 66.29, 0.462, 209.10, 30.81, 66.76 0.560, 203.60, 36.92, 65.92, 0.280, 409.07, 18.37, 65.62, 0.560, 203.99, 36.85, 65.80 0.667, 192.80, 44.55, 66.83, 0.333, 388.95, 22.09, 66.26, 0.667, 193.27, 44.44, 66.67 0.783, 187.81, 51.46, 65.75, 0.391, 378.34, 25.54, 65.27, 0.783, 187.86, 51.44, 65.73 0.909, 177.09, 60.63, 66.70, 0.455, 357.29, 30.05, 66.12, 0.909, 177.18, 60.60, 66.66 1.048, 171.62, 68.82, 65.69, 0.524, 345.04, 34.23, 65.35, 1.048, 171.59, 68.83, 65.70 1.200, 160.76, 80.15, 66.79, 0.600, 325.75, 39.55, 65.92, 1.200, 160.57, 80.24, 66.87 1.368, 155.33, 89.86, 65.67, 0.684, 313.23, 44.56, 65.13, 1.368, 155.30, 89.88, 65.68 1.556, 144.48, 104.05, 66.89, 0.778, 293.56, 51.21, 65.84, 1.556, 144.62, 103.95, 66.82 1.765, 139.33, 115.60, 65.51, 0.882, 281.60, 57.20, 64.82, 1.765, 139.33, 115.60, 65.50 2.000, 128.79, 133.40, 66.70, 1.000, 261.47, 65.70, 65.70, 2.000, 128.86, 133.32, 66.66 2.267, 117.57, 155.26, 68.50, 1.133, 235.53, 77.50, 68.38, 2.267, 117.49, 155.36, 68.54 2.571, 112.96, 171.10, 66.54, 1.286, 246.34, 78.46, 61.02, 2.571, 112.65, 171.57, 66.72 2.923, 101.62, 200.77, 68.68, 1.462, 257.16, 79.33, 54.28, 2.923, 101.13, 201.72, 69.01 3.333, 96.64, 222.22, 66.67, 1.667, 268.00, 80.13, 48.08, 3.333, 95.65, 224.51, 67.35 3.818, 83.93, 268.65, 70.36, 1.909, 278.84, 80.86, 42.36, 3.818, 72.92, 309.24, 80.99 4.400, 80.58, 293.16, 66.63, 2.200, 289.68, 81.55, 37.07, 4.400, 73.59, 321.00, 72.95 5.111, 67.67, 364.96, 71.41, 2.556, 300.58, 82.16, 32.15, 5.111, 74.28, 332.49, 65.05 6.000, 64.45, 399.83, 66.64, 3.000, 311.43, 82.75, 27.58, 6.000, 75.29, 342.26, 57.04 7.143, 50.01, 536.76, 75.15, 3.571, 322.26, 83.30, 23.32, 7.143, 76.25, 352.04, 49.29 8.667, 48.34, 577.52, 66.64, 4.333, 333.09, 83.81, 19.34, 8.667, 77.26, 361.33, 41.69 10.800, 33.47, 866.12, 80.20, 5.400, 343.93, 84.29, 15.61, 10.800, 78.25, 370.48, 34.30 14.000, 32.22, 932.99, 66.64, 7.000, 354.77, 84.74, 12.11, 14.000, 79.26, 379.32, 27.09 19.333, 20.68, 1505.69, 77.88, 9.667, 376.91, 82.62, 8.55, 19.333, 80.27, 387.93, 20.07 30.000, 19.37, 1663.32, 55.44, 15.000, 378.17, 85.18, 5.68, 30.000, 81.26, 396.41, 13.21 62.000, 18.46, 1802.66, 29.08, 31.000, 389.93, 85.36, 2.75, 62.000, 33.57, 991.64, 15.99 inf, 16.68, 2059.77, 0.00, inf, 397.94, 86.34, 0.00, inf, 33.54, 1024.43, 0.00 ---------------------------------------------------------------------------------------------------------------
And here is "memory bandwidth" to "compute throughput" plot on the single precision floating point experiment results:
The source code of mixbench is freely provided, hosted at a github repository and you can find it at https://github.com/ekondis/mixbench. I would be happy to include results from other GPUs as well. Please try this tool and let me know about your extracted results and thoughts.