My MacBook Pro has three OpenCL devices: a CPU, an integrated GPU, and a discrete GPU. I was interested in the performance I could get with my OpenCL GPULib prototype on the various devices, so I ran the benchmark routine on each of them. CL_BENCHMARK simply computes the gamma function for an array of values; see the results for various array sizes below.

Gamma computation performance on host and various OpenCL devices

There are several interesting points to these results:

  1. the discrete GPU did not have the best performance
  2. the CPU OpenCL device performed better than the host, i.e., the CPU, for more than a few million elements

Contact me if you are interested in the GPULib OpenCL prototype (still very rough).

Here’s the details on the various OpenCL devices on my laptop:

IDL> cl_report
Platform 0
Name: Apple
Version: OpenCL 1.2 (Apr 25 2014 22:04:25)

  Device 0
  Name: Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
  Global memory size: 17179869184 bytes (16384 MB)
  Double capable: yes
  Available: yes
  Compiler available: yes
  Device version: OpenCL 1.2
  Driver version: 1.1

  Device 1
  Name: Iris Pro
  Global memory size: 1610612736 bytes (1536 MB)
  Double capable: no
  Available: yes
  Compiler available: yes
  Device version: OpenCL 1.2
  Driver version: 1.2(May  5 2014 20:39:23)

  Device 2
  Name: GeForce GT 750M
  Global memory size: 2147483648 bytes (2048 MB)
  Double capable: yes
  Available: yes
  Compiler available: yes
  Device version: OpenCL 1.2
  Driver version: 8.26.21 310.40.35f08