Optimization


A pre-release of GPULib was made available today. GPULib is a library of common scientific computational routines that use modern graphics processing units (GPUs) to improve performance. I saw speedups of 5-50 times faster than normal IDL code for example code I was writing. Vectorized code that is still not fast enough can be improved with GPULib.

Bindings for Java, Python, MATLAB, and IDL are available. The IDL bindings are fairly simple to use and have a compatibility mode that allows running the code even without a GPU (performance was approximately equivalent to normal IDL code for the examples I wrote).

The documentation lists all the routines available (IDLdoc 3.0b3 spotted in the wild!).

Full disclosure: I work for Tech-X Corporation and worked on the IDL bindings and examples for GPULib.

FastDL 2.4 was released today. FastDL provides a way to do cluster computing from within IDL in two different flavors: mpiDL and TaskDL. For problems which require communication between nodes, mpiDL provides an interface to the MPI standard. If you are familiar with MPI, then mpiDL should be very easy to use. It can now be used with MPICH, MPICH2, or OpenMPI implementations of MPI. For problems where nodes do not need to communicate with each other, TaskDL is a task farming solution for IDL.

Full disclosure: I work for Tech-X and did some work on this project.

Recently I had to find the neighbors in a 3 by 3 by 3 grid around a given point in 3D space. J.D. Smith’s Dimensional Juggling Tutorial has a lot of similar use of the reform. and rebin routines, but mostly for the purpose of “inflating” a dimension of an array. This problem called for cyclically repeating the same values in different patterns.

Continue reading “Local grid points.”

For a recent project, I needed to compute a local mean and standard deviation for an array. Given a window size, the local mean for a given pixel location is the mean of all the elements around that pixel in the window. So the local mean is an array the same size as the original image. The local mean is simple to calculate:

kernel = fltarr(width, width) + 1.0
local_mean = convol(image, kernel) / width^2

for a window size of width. But I wasn't sure if I could calculate the local standard deviation or other moments beyond the mean (without a loop) since they have a power of xj - meani term. It turns out that you can with some tedious algebra; it's fairly straightforward to calculate.

The IDL routine moment can also calculate mean absolute deviation. I haven't been able to come up with a way to compute a local mean absolute deviation without a loop.

Here is mg_local_moment.pro (doc).

Find the n smallest elements of a data setFinding the n smallest elements of an array is a common problem with an informative solution. It is not necessary to sort the entire array to solve this problem. To solve it, I'm going to use histogram in a slightly non-intuitive manner (or maybe it is intuitive if you've read J.D. Smith's Histogram Tutorial).

Continue reading "Finding the n smallest elements in an array."

The TEMPORARY routine is a simple way to use memory more efficiently. Statements of the form

a = a + 5

create a temporary variable to hold the intermediate result a + 5. This temporary variable effectively doubles the amount of memory needed to store a (but only for the time necessary to do the operation). This is not necessary if we know that the expression a + 5 will be reassigned to a. In this case, a itself can be used as the temporary variable. Use TEMPORARY to indicate to IDL which variable can be used as the temporary variable:

a = temporary(a) + 5

Now, adding 5 will happen "in place" in the memory that holds a. Incidently, for operations like the above, an even more efficient way to perform this calculation is:

a += 5

There are composite assignment forms like the above for all of IDL's binary operators.

On the other hand, TEMPORARY does not help for expressions where the variable to be reassigned to is used multiple times like

a = a^3 + a^2

Neither a on the right-hand side of this statement can be marked temporary because marking a variable as temporary makes the variable undefined (until it is reassigned to again). In this case, if the a in a^3 was marked as temporary, then the a in a^2 would be undefined. (And vice versa.)

The argument of TEMPORARY must be a named variable to have an effect. So

a = temporary(a + 5)

will perform its arithmetic correctly, but will create a temporary variable in doing it.