I have just released mixbench on github. It is a benchmark tool which assesses performance bounds on GPUs (compute or memory bound) under mixed workloads. Unfortunately, it's currently implemented on CUDA so only NVidia GPUs can be used. The compute part can be SP Flops, DP Flops or Int ops and the memory part is global memory traffic. Running the multiple experiments in a wide range of operational intensity values allows to examine the performance of GPUs under different kernel characteristics.
Running the program under a GTX-480 gives the following output:
mixbench (compute & memory balancing GPU microbenchmark) ------------------------ Device specifications ------------------------ Device: GeForce GTX 480 CUDA driver version: 5.50 GPU clock rate: 1401 MHz Memory clock rate: 924 MHz Memory bus width: 384 bits WarpSize: 32 L2 cache size: 768 KB Total global mem: 1535 MB ECC enabled: No Compute Capability: 2.0 Total SPs: 480 (15 MPs x 32 SPs/MP) Compute throughput: 1344.96 GFlops (theoretical single precision FMAs) Memory bandwidth: 177.41 GB/sec ----------------------------------------------------------------------- Total GPU memory 1610285056, free 1195106304 Buffer size: 256MB Trade-off type:compute with global memory (block strided) ---- EXCEL data ---- Operations ratio ; Single Precision ops ;;; Double precision ops ;;; Integer operations compute/memory ; Time; GFLOPS; GB/sec; Time; GFLOPS; GB/sec; Time; GIOPS; GB/sec 0/32 ; 240.531; 0.00; 142.85; 475.150; 0.00; 144.63; 240.205; 0.00; 143.04 1/31 ; 233.548; 9.20; 142.52; 460.193; 4.67; 144.66; 233.484; 9.20; 142.56 2/30 ; 225.249; 19.07; 143.01; 445.144; 9.65; 144.73; 225.235; 19.07; 143.02 3/29 ; 218.552; 29.48; 142.48; 430.575; 14.96; 144.64; 218.745; 29.45; 142.35 4/28 ; 210.345; 40.84; 142.93; 415.425; 20.68; 144.74; 210.091; 40.89; 143.10 5/27 ; 203.132; 52.86; 142.72; 400.472; 26.81; 144.78; 203.275; 52.82; 142.62 6/26 ; 194.468; 66.26; 143.56; 385.434; 33.43; 144.86; 194.314; 66.31; 143.67 7/25 ; 187.470; 80.19; 143.19; 370.915; 40.53; 144.74; 187.475; 80.18; 143.18 8/24 ; 175.115; 98.11; 147.16; 355.723; 48.30; 144.89; 175.132; 98.10; 147.14 9/23 ; 171.760; 112.53; 143.78; 341.353; 56.62; 144.70; 171.920; 112.42; 143.65 10/22 ; 163.397; 131.43; 144.57; 326.007; 65.87; 144.92; 163.252; 131.54; 144.70 11/21 ; 155.797; 151.62; 144.73; 311.655; 75.80; 144.70; 155.814; 151.61; 144.71 12/20 ; 146.573; 175.82; 146.51; 296.386; 86.95; 144.91; 146.662; 175.71; 146.42 13/19 ; 138.853; 201.06; 146.93; 281.757; 99.08; 144.81; 138.941; 200.93; 146.83 14/18 ; 129.727; 231.75; 148.98; 266.401; 112.86; 145.10; 129.744; 231.72; 148.97 15/17 ; 121.228; 265.72; 150.57; 251.283; 128.19; 145.28; 121.339; 265.47; 150.43 16/16 ; 120.065; 286.18; 143.09; 235.740; 145.75; 145.75; 120.122; 286.04; 143.02 17/15 ; 111.357; 327.84; 144.64; 219.472; 166.34; 146.77; 111.528; 327.34; 144.41 18/14 ; 106.430; 363.19; 141.24; 231.498; 166.98; 129.87; 106.541; 362.82; 141.10 19/13 ; 96.118; 424.50; 145.22; 243.534; 167.54; 114.63; 96.494; 422.85; 144.66 20/12 ; 89.602; 479.34; 143.80; 256.247; 167.61; 100.57; 89.642; 479.13; 143.74 21/11 ; 81.976; 550.13; 144.08; 269.055; 167.61; 87.80; 83.091; 542.74; 142.15 22/10 ; 76.066; 621.10; 141.16; 282.898; 167.00; 75.91; 76.068; 621.08; 141.15 23/ 9 ; 65.631; 752.57; 147.24; 295.743; 167.01; 65.35; 76.895; 642.33; 125.67 24/ 8 ; 60.809; 847.57; 141.26; 307.479; 167.62; 55.87; 80.099; 643.45; 107.24 25/ 7 ; 52.032; 1031.82; 144.45; 321.449; 167.02; 46.76; 83.296; 644.53; 90.23 26/ 6 ; 48.321; 1155.49; 133.33; 334.305; 167.02; 38.54; 86.519; 645.35; 74.46 27/ 5 ; 49.519; 1170.90; 108.42; 347.157; 167.02; 30.93; 89.729; 646.19; 59.83 28/ 4 ; 50.704; 1185.90; 84.71; 360.013; 167.02; 23.86; 92.891; 647.31; 46.24 29/ 3 ; 52.024; 1197.09; 61.92; 372.867; 167.02; 17.28; 96.115; 647.94; 33.51 30/ 2 ; 53.377; 1206.97; 40.23; 385.722; 167.02; 11.13; 99.328; 648.61; 21.62 31/ 1 ; 53.437; 1245.80; 20.09; 397.203; 167.60; 5.41; 101.247; 657.52; 10.61 32/ 0 ; 53.558; 1283.08; 0.00; 410.012; 167.60; 0.00; 102.494; 670.47; 0.00 --------------------
% of peak SP Flops and memory bandwidth performance related with the operational intensity |
% of peak DP Flops and memory bandwidth performance related with the operational intensity |
Compute throughput (SP Flops) vs memory bandwidth |
Compute throughput (DP Flops) vs memory bandwidth |
Publication:
Since this work was initially part of published research please cite the following publication where applicable:Konstantinidis, E.; Cotronis, Y., "A Practical Performance Model for Compute and Memory Bound GPU Kernels," Parallel, Distributed and Network-Based Processing (PDP), 2015 23rd Euromicro International Conference on , vol., no., pp.651,658, 4-6 March 2015
doi: 10.1109/PDP.2015.51
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7092788&isnumber=7092002