One of the features I was most excited about in CUDA 6 is the drop-in library (nvBLAS) support for BLAS. The idea is to use LD_PRELOAD when launching IDL to indicate that BLAS should be coming from the nvBLAS instead of the BLAS implementation that would normally be found, e.g., in IDL’s case, the idl_lapack.so distributed with IDL. I’ve had problems getting it to work so far, though.

Right now, the way I’m starting IDL is something like the following:

$ export NVBLAS_CONFIG_FILE=/path/to/nvblas.conf
$ export LD_LIBRARY_PATH=/usr/local/cuda-6.0/lib64
$ LD_PRELOAD=/usr/local/cuda-6.0/lib64/libnvblas.so idl

This seems to be recognizing nvBLAS because it will crash if I don’t set LD_LIBRARY_PATH and NVBLAS_CONFIG_FILE (I’m using a default configuration file). But I have not been able to get any different results testing performance of MATRIX_MULTIPLY between using nvBLAS or not. I will continue to test this, because it’s too interesting to pass up and there are a couple of items that I haven’t explored yet:

  1. I’m not using big enough matrices at 5000 x 5000 elements.
  2. I’m not setting something in the configuration file specified by NVBLAS_CONFIG_FILE.

I don’t imagine the speedup from nvBLAS is going to be amazing because memory transfer will eat into the performance, but you can’t beat not having to change any code at all. I am worried that the way that IDL loads dynamic libraries might get in the way of this working.