Parallel++: Maxwell for the masses (GM206)

Saturday, February 28, 2015

Maxwell for the masses (GM206)

As you probably already know the mainstream version of Maxwell GPU has already been released in the form of GM206. The graphics card bearing the chip is the GTX-960. The card seems to be pretty efficient and a significant improvement over Kepler especially in compute applications which is the one aspect that I'm particularly interested in. There has been some controversy of course due to its short memory bus (128bit) which entails a peak memory bandwidth of 112GB/sec. However, the larger cache memory should help alleviating this bottleneck.

The Zotac GTX-960 AMP! edition

In order to give you a taste about the compute capabilities of Maxwell I provide the results of experimenting with the OpenCL NBody example (16384 bodies) from the NVidia SDK 4.2 (the last one with OpenCL support). The GTX-960 yields a well above of 1TeraFlop performance which is impressive. I also performed executions with 3 more GPUs. All results are depicted in the chart that follows.

The red bars represent measured performance in GFLOPs and the green ones the efficiency as the ratio measured/peak GFLOPs performance.

The Maxwell architecture seems to address many issues with compute efficiency of its predecessor. However, there are two drawbacks. First, the low memory bandwidth as mentioned above and second, the quite low compute performance in double precision operations which is set now at 1/32 ratio with regard to single precision operations.

One last observation is the quite good performance of the AMD GPU although the example application had been developed by NVidia and it's reasonable to think that it is optimized for its own GPUs. This could be one of the main reasons that they stopped supporting the OpenCL paradigm.

1 comment:

Jean SOctober 27, 2021 at 8:43 AM
Thanks for the postt
ReplyDelete
Replies

Add comment