One of the features I’m most excited about in GPULib 1.6 is the ability to create your own kernel and call it from within the GPULib framework. Details and an example are on the GPULib blog. This feature requires CUDA programming knowledge, but makes it much easier to integrate custom CUDA code into IDL.
You can download a free version of GPULib which has basic features and includes a short term license for the full features of GPULib as well. Calling custom CUDA kernels requires the full version of GPULib.
Full disclosure: I work for Tech-X Corporation and I am the product manager for GPULib.