My MacBook Pro has three OpenCL devices: a CPU, an integrated GPU, and a discrete GPU. I was interested in the performance I could get with my OpenCL GPULib prototype on the various devices, so I ran the benchmark routine on each of them.
CL_BENCHMARK simply computes the gamma function for an array of values; see the results for various array sizes below.
There are several interesting points to these results:
- the discrete GPU did not have the best performance
- the CPU OpenCL device performed better than the host, i.e., the CPU, for more than a few million elements
Contact me if you are interested in the GPULib OpenCL prototype (still very rough).
Here’s the details on the various OpenCL devices on my laptop:
IDL> cl_report Platform 0 Name: Apple Version: OpenCL 1.2 (Apr 25 2014 22:04:25) Device 0 Name: Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz Global memory size: 17179869184 bytes (16384 MB) Double capable: yes Available: yes Compiler available: yes Device version: OpenCL 1.2 Driver version: 1.1 Device 1 Name: Iris Pro Global memory size: 1610612736 bytes (1536 MB) Double capable: no Available: yes Compiler available: yes Device version: OpenCL 1.2 Driver version: 1.2(May 5 2014 20:39:23) Device 2 Name: GeForce GT 750M Global memory size: 2147483648 bytes (2048 MB) Double capable: yes Available: yes Compiler available: yes Device version: OpenCL 1.2 Driver version: 8.26.21 310.40.35f08