One of the features I was most excited about in CUDA 6 is the drop-in library (nvBLAS) support for BLAS. The idea is to use LD_PRELOAD
when launching IDL to indicate that BLAS should be coming from the nvBLAS instead of the BLAS implementation that would normally be found, e.g., in IDL’s case, the idl_lapack.so
distributed with IDL. I’ve had problems getting it to work so far, though.
Right now, the way I’m starting IDL is something like the following:
$ export NVBLAS_CONFIG_FILE=/path/to/nvblas.conf
$ export LD_LIBRARY_PATH=/usr/local/cuda-6.0/lib64
$ LD_PRELOAD=/usr/local/cuda-6.0/lib64/libnvblas.so idl
This seems to be recognizing nvBLAS because it will crash if I don’t set LD_LIBRARY_PATH
and NVBLAS_CONFIG_FILE
(I’m using a default configuration file). But I have not been able to get any different results testing performance of MATRIX_MULTIPLY
between using nvBLAS or not. I will continue to test this, because it’s too interesting to pass up and there are a couple of items that I haven’t explored yet:
- I’m not using big enough matrices at 5000 x 5000 elements.
- I’m not setting something in the configuration file specified by
NVBLAS_CONFIG_FILE
.
I don’t imagine the speedup from nvBLAS is going to be amazing because memory transfer will eat into the performance, but you can’t beat not having to change any code at all. I am worried that the way that IDL loads dynamic libraries might get in the way of this working.
May 11th, 2014 at 12:27 pm
Look at the symbols defined in libnvblas.so and
at those defined in idl_lapack.so and see if the names
match, e.g., if the names have the same number of
underscores.
May 12th, 2014 at 3:03 pm
Good idea, but nvBLAS seems to have thought of that already. It defines a plain and trailing underscore version, i.e.,
sgemm
andsgemm_
. Theidl_lapack.so
from IDL has the underscore version, i.e.,sgemm_
.I’m currently thinking because of the way that IDL loads DLMs must interfere with the
LD_PRELOAD
mechanism. I assume IDL does adlopen
to a specific path to load the DLM which means theLD_PRELOAD
would be bypassed (I assume).May 12th, 2014 at 3:17 pm
Does this command show that dlopen is used by idl:
LD_PRELOAD=/usr/local/cuda-6.0/lib64/libnvblas.so \
ltrace -f -e dlopen idl \
May 13th, 2014 at 4:24 pm
I had to launch the IDL binary directly instead of calling the IDL script (and add the IDL bin directory to the
LD_LIBRARY_PATH
), but I didn’t see anything in my output:May 13th, 2014 at 4:25 pm
But this is another problem, I should see a DLM loaded message about the LAPACK library, but it’s not present. The symbols are in there, but maybe they are not used, some other version is used?
May 16th, 2014 at 12:08 pm
After some emails with Exelis VIS, it doesn’t look like this will work due to the way BLAS is linked.