CUDA 6 nvBLAS in IDL

One of the features I was most excited about in CUDA 6 is the drop-in library (nvBLAS) support for BLAS. The idea is to use LD_PRELOAD when launching IDL to indicate that BLAS should be coming from the nvBLAS instead of the BLAS implementation that would normally be found, e.g., in IDL’s case, the idl_lapack.so distributed with IDL. I’ve had problems getting it to work so far, though.

Right now, the way I’m starting IDL is something like the following:

$ export NVBLAS_CONFIG_FILE=/path/to/nvblas.conf
$ export LD_LIBRARY_PATH=/usr/local/cuda-6.0/lib64
$ LD_PRELOAD=/usr/local/cuda-6.0/lib64/libnvblas.so idl

This seems to be recognizing nvBLAS because it will crash if I don’t set LD_LIBRARY_PATH and NVBLAS_CONFIG_FILE (I’m using a default configuration file). But I have not been able to get any different results testing performance of MATRIX_MULTIPLY between using nvBLAS or not. I will continue to test this, because it’s too interesting to pass up and there are a couple of items that I haven’t explored yet:

I’m not using big enough matrices at 5000 x 5000 elements.
I’m not setting something in the configuration file specified by NVBLAS_CONFIG_FILE.

I don’t imagine the speedup from nvBLAS is going to be amazing because memory transfer will eat into the performance, but you can’t beat not having to change any code at all. I am worried that the way that IDL loads dynamic libraries might get in the way of this working.

$ nvblas_launch.sh (0, 0, 0x687400, -1, 0x1f25bc2) = 0x3bf9621160 (0x3bf9621160, 0, 0, 0x3bf9621160, 0) = 395262 (0, 0, 0, 8, 0x3bfa800e70) = 0x3bf9621160 (0x3bf9621160, 0, 0, 0x3bf9621160, 0) = 0x23a4da8 (0, 0, 0, 4, 0x7f6fdf86c7e8) = 0x3bf9621160 IDL Version 8.3 (linux x86_64 m64). (c) 2013, Exelis Visual Information Solutions, Inc. [pid 476] (0x3bf9621160, 0, 0, 0x3bf9621160, 0) = 0x256ff72 [pid 476] (0, 0, 0, 15, 0x398be01348) = 0x3bf9621160 [pid 476] (0, 0, 0, 2, 2) = 0x3bf9621160 [pid 476] (0, 0, 0, 3, 3) = 0x3bf9621160 [pid 476] (0x3bf9621160, 0, 0, 0x3bf9621160, 0) = 131134 [pid 476] (0, 0, 0, 3, 0x963cf85) = 0x3bf9621160 Installation number: 209577. Licensed for use by: Tech-X Corporation IDL> a = findgen(10, 10) IDL> b = findgen(10, 10) IDL> c = matrix_multiply(a, b) IDL> exit [pid 482] +++ exited (status 0) +++ [pid 481] +++ exited (status 0) +++ [pid 480] +++ exited (status 0) +++ [pid 479] +++ exited (status 0) +++ [pid 478] +++ exited (status 0) +++ [pid 477] +++ exited (status 0) +++ [pid 483] +++ exited (status 0) +++ +++ exited (status 0) +++

michaelgalloy.com

Resources for IDL developers

Buy Modern IDL now!

Modern IDL offers IDL programmers one place to look for explanation, techniques, and reference material, for beginners and advanced users alike.

"... But I've always wanted a thorough, concise, up-to-date overview of the the IDL language and its vast capabilities. This is exactly what Mike's book provides in 464 very informative pages... Highly recommended!"
—Mort Canty

About me

I'm a software developer focusing on high-performance computing and visualization in scientific programming. I work mostly in IDL, but occasionally use C, CUDA, and Python.

I currently work for the National Center for Atmospheric Research (NCAR) at the Mauna Loa Solar Observatory. Previously, I worked for Tech-X Corporation, where I was the main developer for GPULib, a library of IDL bindings for GPU accelerated computation routines.

I am the creator and main developer for the open source projects IDLdoc, mgunit, and rIDL.

Contact me at Mastodon or via email at mgalloy at gmail dot com. For more details about me, see my CV/resume.

Need consulting/instruction? Contact me.

Gabriel M. says:
May 11th, 2014 at 12:27 pm

Look at the symbols defined in libnvblas.so and
at those defined in idl_lapack.so and see if the names
match, e.g., if the names have the same number of
underscores.

Michael Galloy says:
May 12th, 2014 at 3:03 pm

Good idea, but nvBLAS seems to have thought of that already. It defines a plain and trailing underscore version, i.e., sgemm and sgemm_. The idl_lapack.so from IDL has the underscore version, i.e., sgemm_.

I’m currently thinking because of the way that IDL loads DLMs must interfere with the LD_PRELOAD mechanism. I assume IDL does a dlopen to a specific path to load the DLM which means the LD_PRELOAD would be bypassed (I assume).

Gabriel M. says:
May 12th, 2014 at 3:17 pm

Does this command show that dlopen is used by idl:

LD_PRELOAD=/usr/local/cuda-6.0/lib64/libnvblas.so \
ltrace -f -e dlopen idl \

Michael Galloy says:
May 13th, 2014 at 4:24 pm

I had to launch the IDL binary directly instead of calling the IDL script (and add the IDL bin directory to the LD_LIBRARY_PATH), but I didn’t see anything in my output:

Michael Galloy says:
May 13th, 2014 at 4:25 pm

But this is another problem, I should see a DLM loaded message about the LAPACK library, but it’s not present. The symbols are in there, but maybe they are not used, some other version is used?

Michael Galloy says:
May 16th, 2014 at 12:08 pm

After some emails with Exelis VIS, it doesn’t look like this will work due to the way BLAS is linked.

6 Responses to “CUDA 6 nvBLAS in IDL”

Leave a Reply

michaelgalloy.com

Buy Modern IDL now!

About me

Other

Feeds

GPULib

TaskDL

mpiDL

Remote Data Toolkit

Modern IDL

IDLdoc

mgunit

rIDL

mglib