One of the features I was most excited about in CUDA 6 is the drop-in library (nvBLAS) support for BLAS. The idea is to use LD_PRELOAD when launching IDL to indicate that BLAS should be coming from the nvBLAS instead of the BLAS implementation that would normally be found, e.g., in IDL’s case, the distributed with IDL. I’ve had problems getting it to work so far, though.

Right now, the way I’m starting IDL is something like the following:

$ export NVBLAS_CONFIG_FILE=/path/to/nvblas.conf
$ export LD_LIBRARY_PATH=/usr/local/cuda-6.0/lib64
$ LD_PRELOAD=/usr/local/cuda-6.0/lib64/ idl

This seems to be recognizing nvBLAS because it will crash if I don’t set LD_LIBRARY_PATH and NVBLAS_CONFIG_FILE (I’m using a default configuration file). But I have not been able to get any different results testing performance of MATRIX_MULTIPLY between using nvBLAS or not. I will continue to test this, because it’s too interesting to pass up and there are a couple of items that I haven’t explored yet:

  1. I’m not using big enough matrices at 5000 x 5000 elements.
  2. I’m not setting something in the configuration file specified by NVBLAS_CONFIG_FILE.

I don’t imagine the speedup from nvBLAS is going to be amazing because memory transfer will eat into the performance, but you can’t beat not having to change any code at all. I am worried that the way that IDL loads dynamic libraries might get in the way of this working.