GPULib 1.2 was released last week and is available on the Tech-X website. This release focused on improved MATLAB bindings with a few important bug fixes for the IDL bindings along with a few new kernels. Full release notes are after the break.

Changes/new features in GPULib version 1.2

== General ==

The main focus of this release is on the improved MATLAB bindings. 
Some new kernels were added since the release of version 1.0.8.

== GPULib kernels ==

gpuAtan2, gpuFmod, gpuPow

== IDL bindings ==

- Support for the new kernels.  For the time being, these functions 
  only support float and double (so no complex types) and no affine 
  transform arguments. 
- Added example showing the use of page-locked variables 
  for fast CPU/GPU data transfer.
- Added finite-different time-domain example demonstrating the use of 
  views for efficient array sub-selection
- Added spectral angle mapper example.
- Bug fixes for decon_hubble example
- improved documentation

== MATLAB bindings ==

MATLAB GPULib version 1.2 has many major changes from the previous 
release. READ the README!

First and foremost, there are two distinct and completely separate 
interfaces to the library. They should NEVER be intermingled.

1.) The accArray class replaces the old gpuArray class from the 
    previous release. This interface requires MATLAB R2008a or 
    higher. This interface has automatic garbage collection, 
    overloaded operators, and overloaded versions of native MATLAB 
    functions, ...
2.) "gpu" interface class can be used with older versions of MATLAB 
    though it's not clear how far back one can go.

The interface was redesigned for speed. The accArray class is about 
2.5X faster than the gpuArray interface for many functions tested. 
Some of the "gpu"-prefixed functions can be up to 10X faster than the 
gpuArray interface. 

MATLAB GPULib has many new functions including
1.) fft, ifft, fft2, and ifft2
2.) Reduction operations, including sum, cumsum, prod, cumprod, ... 
    These functions support 1D vectors and 2D matrices currently.
3.) Single (Complex) and Double (Complex) precision versions of 
    Matrix Multiplication, Transpose and Complex Conjugate Transpose.

The accArray class does not support the subsref.m (i.e. b=A(i)), 
subsasgn.m (i.e. A(i)=b), or array concatentation functions like 
A=[B; C; D], These will be supported in future releases.

The "gpu" interface supports subscripting through the gpuSubsref, 
gpuSubsasgn, and gpuSub2ind functions.

Both interfaces support page-locked host memory allocation via 
cudaMallocHost. This gives the possibility of much faster memory 
transfer from CPU memory to GPU memory and back.

Both interfaces include more comprehensive native MATLAB-like 

New examples include: bench, bwtest, fdtd, and fftExample

Full disclosure: I work for Tech-X Corporation and worked on the IDL bindings and examples for GPULib.