My MacBook Pro has three OpenCL devices: a CPU, an integrated GPU, and a discrete GPU. I was interested in the performance I could get with my OpenCL GPULib prototype on the various devices, so I ran the benchmark routine on each of them. `CL_BENCHMARK` simply computes the gamma function for an array of values; see the results for various array sizes below.

Gamma computation performance on host and various OpenCL devices

There are several interesting points to these results:

1. the discrete GPU did not have the best performance
2. the CPU OpenCL device performed better than the host, i.e., the CPU, for more than a few million elements

Contact me if you are interested in the GPULib OpenCL prototype (still very rough).

Here's the details on the various OpenCL devices on my laptop:

IDL> cl_report
Platform 0
Name: Apple
Version: OpenCL 1.2 (Apr 25 2014 22:04:25)

Device 0
Name: Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Global memory size: 17179869184 bytes (16384 MB)
Double capable: yes
Available: yes
Compiler available: yes
Device version: OpenCL 1.2
Driver version: 1.1

Device 1
Name: Iris Pro
Global memory size: 1610612736 bytes (1536 MB)
Double capable: no
Available: yes
Compiler available: yes
Device version: OpenCL 1.2
Driver version: 1.2(May 5 2014 20:39:23)

Device 2
Name: GeForce GT 750M
Global memory size: 2147483648 bytes (2048 MB)
Double capable: yes
Available: yes
Compiler available: yes
Device version: OpenCL 1.2
Driver version: 8.26.21 310.40.35f08