One of the features I was most excited about in [CUDA 6] is the drop-in library ([nvBLAS]) support for BLAS. The idea is to use `LD_PRELOAD` when launching IDL to indicate that BLAS should be coming from the nvBLAS instead of the BLAS implementation that would normally be found, e.g., in IDL's case, the `` distributed with IDL. I've had problems getting it to work so far, though.

Right now, the way I'm starting IDL is something like the following:

$ export NVBLAS_CONFIG_FILE=/path/to/nvblas.conf
$ export LD_LIBRARY_PATH=/usr/local/cuda-6.0/lib64
$ LD_PRELOAD=/usr/local/cuda-6.0/lib64/ idl

This seems to be recognizing nvBLAS because it will crash if I don't set `LD_LIBRARY_PATH` and `NVBLAS_CONFIG_FILE` (I'm using a default configuration file). But I have not been able to get any different results testing performance of `MATRIX_MULTIPLY` between using nvBLAS or not. I will continue to test this, because it's too interesting to pass up and there are a couple of items that I haven't explored yet:

1. I'm not using big enough matrices at 5000 x 5000 elements.
2. I'm not setting something in the configuration file specified by `NVBLAS_CONFIG_FILE`.

I don't imagine the speedup from nvBLAS is going to be amazing because memory transfer will eat into the performance, but you can't beat not having to change any code at all. I am worried that the way that IDL loads dynamic libraries might get in the way of this working.

[CUDA 6]: "CUDA Toolkit Documentation"
[nvBLAS]: "NVBLAS :: CUDA Toolkit Documentation"