Sunday, November 22, 2015

mixbench benchmark OpenCL implementation

Four and a half months ago I posted an article about mixbench benchmark. This benchmark was used to assess performance of an artificial kernel with mixed compute and memory operations which corresponds to various operational intensities (Flops/byte ratios). The implementation was based on CUDA and therefore only NVidia GPUs could be used.

Now, I've ported the CUDA implementation to OpenCL and here I provide some performance numbers on an AMD R7-260X. Here is the output when using 128MB memory buffer:

mixbench-ocl (compute & memory balancing GPU microbenchmark)
Use "-h" argument to see available options
------------------------ Device specifications ------------------------
Device:              Bonaire
Driver version:      1800.11 (VM)
GPU clock rate:      1175 MHz
Total global mem:    1871 MB
Max allowed buffer:  1336 MB
OpenCL version:      OpenCL 2.0 AMD-APP (1800.11)
Total CUs:           14
-----------------------------------------------------------------------
Buffer size: 128MB
Workgroup size: 256
Workitem stride: NDRange
Loading kernel source file...
Precompilation of kernels... [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>]
--------------------------------------------------- CSV data --------------------------------------------------
Single Precision ops,,,,              Double precision ops,,,,              Integer operations,,,
Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Iops/byte, ex.time,   GIOPS, GB/sec
     0.000,  273.95,    0.00,  62.71,      0.000,  519.39,    0.00,  66.15,     0.000,  258.30,    0.00,  66.51
     0.065,  252.12,    4.26,  66.01,      0.032,  506.86,    2.12,  65.67,     0.065,  252.08,    4.26,  66.02
     0.133,  241.49,    8.89,  66.69,      0.067,  487.11,    4.41,  66.13,     0.133,  241.59,    8.89,  66.67
     0.207,  235.72,   13.67,  66.05,      0.103,  474.25,    6.79,  65.66,     0.207,  236.35,   13.63,  65.87
     0.286,  225.46,   19.05,  66.67,      0.143,  453.92,    9.46,  66.23,     0.286,  225.05,   19.08,  66.80
     0.370,  219.59,   24.45,  66.01,      0.185,  442.80,   12.12,  65.47,     0.370,  220.15,   24.39,  65.84
     0.462,  209.03,   30.82,  66.78,      0.231,  421.14,   15.30,  66.29,     0.462,  209.10,   30.81,  66.76
     0.560,  203.60,   36.92,  65.92,      0.280,  409.07,   18.37,  65.62,     0.560,  203.99,   36.85,  65.80
     0.667,  192.80,   44.55,  66.83,      0.333,  388.95,   22.09,  66.26,     0.667,  193.27,   44.44,  66.67
     0.783,  187.81,   51.46,  65.75,      0.391,  378.34,   25.54,  65.27,     0.783,  187.86,   51.44,  65.73
     0.909,  177.09,   60.63,  66.70,      0.455,  357.29,   30.05,  66.12,     0.909,  177.18,   60.60,  66.66
     1.048,  171.62,   68.82,  65.69,      0.524,  345.04,   34.23,  65.35,     1.048,  171.59,   68.83,  65.70
     1.200,  160.76,   80.15,  66.79,      0.600,  325.75,   39.55,  65.92,     1.200,  160.57,   80.24,  66.87
     1.368,  155.33,   89.86,  65.67,      0.684,  313.23,   44.56,  65.13,     1.368,  155.30,   89.88,  65.68
     1.556,  144.48,  104.05,  66.89,      0.778,  293.56,   51.21,  65.84,     1.556,  144.62,  103.95,  66.82
     1.765,  139.33,  115.60,  65.51,      0.882,  281.60,   57.20,  64.82,     1.765,  139.33,  115.60,  65.50
     2.000,  128.79,  133.40,  66.70,      1.000,  261.47,   65.70,  65.70,     2.000,  128.86,  133.32,  66.66
     2.267,  117.57,  155.26,  68.50,      1.133,  235.53,   77.50,  68.38,     2.267,  117.49,  155.36,  68.54
     2.571,  112.96,  171.10,  66.54,      1.286,  246.34,   78.46,  61.02,     2.571,  112.65,  171.57,  66.72
     2.923,  101.62,  200.77,  68.68,      1.462,  257.16,   79.33,  54.28,     2.923,  101.13,  201.72,  69.01
     3.333,   96.64,  222.22,  66.67,      1.667,  268.00,   80.13,  48.08,     3.333,   95.65,  224.51,  67.35
     3.818,   83.93,  268.65,  70.36,      1.909,  278.84,   80.86,  42.36,     3.818,   72.92,  309.24,  80.99
     4.400,   80.58,  293.16,  66.63,      2.200,  289.68,   81.55,  37.07,     4.400,   73.59,  321.00,  72.95
     5.111,   67.67,  364.96,  71.41,      2.556,  300.58,   82.16,  32.15,     5.111,   74.28,  332.49,  65.05
     6.000,   64.45,  399.83,  66.64,      3.000,  311.43,   82.75,  27.58,     6.000,   75.29,  342.26,  57.04
     7.143,   50.01,  536.76,  75.15,      3.571,  322.26,   83.30,  23.32,     7.143,   76.25,  352.04,  49.29
     8.667,   48.34,  577.52,  66.64,      4.333,  333.09,   83.81,  19.34,     8.667,   77.26,  361.33,  41.69
    10.800,   33.47,  866.12,  80.20,      5.400,  343.93,   84.29,  15.61,    10.800,   78.25,  370.48,  34.30
    14.000,   32.22,  932.99,  66.64,      7.000,  354.77,   84.74,  12.11,    14.000,   79.26,  379.32,  27.09
    19.333,   20.68, 1505.69,  77.88,      9.667,  376.91,   82.62,   8.55,    19.333,   80.27,  387.93,  20.07
    30.000,   19.37, 1663.32,  55.44,     15.000,  378.17,   85.18,   5.68,    30.000,   81.26,  396.41,  13.21
    62.000,   18.46, 1802.66,  29.08,     31.000,  389.93,   85.36,   2.75,    62.000,   33.57,  991.64,  15.99
       inf,   16.68, 2059.77,   0.00,        inf,  397.94,   86.34,   0.00,       inf,   33.54, 1024.43,   0.00
---------------------------------------------------------------------------------------------------------------

And here is "memory bandwidth" to "compute throughput" plot on the single precision floating point experiment results:

The source code of mixbench is freely provided, hosted at a github repository and you can find it at https://github.com/ekondis/mixbench. I would be happy to include results from other GPUs as well. Please try this tool and let me know about your extracted results and thoughts.

Monday, November 16, 2015

OpenCL 2.1 and SPIR-V standards released!

I've just noticed that the OpenCL 2.1 and SPIR-V standards were released today!

I just hope that vendors will not take to long to introduce up to date SDKs and drivers.

OpenCL 2.1
SPIR-V