My MacBook Pro has three OpenCL devices: a CPU, an integrated GPU, and a discrete GPU. I was interested in the performance I could get with my OpenCL GPULib prototype on the various devices, so I ran the benchmark routine on each of them. CL_BENCHMARK
simply computes the gamma function for an array of values; see the results for various array sizes below.
There are several interesting points to these results:
- the discrete GPU did not have the best performance
- the CPU OpenCL device performed better than the host, i.e., the CPU, for more than a few million elements
Contact me if you are interested in the GPULib OpenCL prototype (still very rough).
Here’s the details on the various OpenCL devices on my laptop:
IDL> cl_report
Platform 0
Name: Apple
Version: OpenCL 1.2 (Apr 25 2014 22:04:25)
Device 0
Name: Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Global memory size: 17179869184 bytes (16384 MB)
Double capable: yes
Available: yes
Compiler available: yes
Device version: OpenCL 1.2
Driver version: 1.1
Device 1
Name: Iris Pro
Global memory size: 1610612736 bytes (1536 MB)
Double capable: no
Available: yes
Compiler available: yes
Device version: OpenCL 1.2
Driver version: 1.2(May 5 2014 20:39:23)
Device 2
Name: GeForce GT 750M
Global memory size: 2147483648 bytes (2048 MB)
Double capable: yes
Available: yes
Compiler available: yes
Device version: OpenCL 1.2
Driver version: 8.26.21 310.40.35f08