Category "Optimization"


Atle Borsholm recently posted a clever solution for finding the n-th smallest element in an array on the IDL Data Point. He compares this to a naive solution which simply sorts all the elements and grabs the n-th element:

IDL> tic & x = ordinal_1(a, 123456) & toc
% Time elapsed: 3.0336552 seconds.

His solution performs much better:

IDL> tic & x = ordinal_2(a, 123456) & toc
% Time elapsed: 0.46286297 seconds.

I have a HISTOGRAM-based solution called MG_N_SMALLEST in mglib that can do even better:

IDL> tic & x = mg_n_smallest(a, 123456) & toc
% Time elapsed: 0.18394303 seconds.

Note: MG_N_SMALLEST does not return the n-th smallest element directly, but returns indices to the smallest n elements.

I have a more detailed description of what MG_N_SMALLEST is doing in an older article. I like this routine as a good example of using HISTOGRAM and its REVERSE_INDICES keyword. It also a nice example of when using a FOR loop in IDL isn’t so bad.

I include even more detail on this routine in the “Performance” chapter of my book.

Arrays of non-negative integers are encountered frequently in IDL, e.g., for index arrays and lists of ENVI file IDs to name a couple examples. Since valid data elements are non-negative integers, -1 is used to indicate an empty set. I have a few routines for efficiently computing set unions, intersections, differences, and complements.

For example, suppose I have found the valid data in an image with:

IDL> valid_ind = where(band gt 0.)

And then I find indices of the clouds in the image using my special routine FIND_CLOUD_INDICES:

IDL> noncloud_ind = find_cloud_indices(band)

Now, if I want to find the indices of the pixels of the image that are valid and not cloud pixels, I can use my MG_SETINTERSECTION routine:

IDL> valid_ind = mg_setintersection(valid_ind, noncloud_ind)

Download the routines or look at the documentation.

A pre-release of GPULib was made available today. GPULib is a library of common scientific computational routines that use modern graphics processing units (GPUs) to improve performance. I saw speedups of 5-50 times faster than normal IDL code for example code I was writing. Vectorized code that is still not fast enough can be improved with GPULib.

Bindings for Java, Python, MATLAB, and IDL are available. The IDL bindings are fairly simple to use and have a compatibility mode that allows running the code even without a GPU (performance was approximately equivalent to normal IDL code for the examples I wrote).

The documentation lists all the routines available (IDLdoc 3.0b3 spotted in the wild!).

Full disclosure: I work for Tech-X Corporation and worked on the IDL bindings and examples for GPULib.

FastDL 2.4 was released today. FastDL provides a way to do cluster computing from within IDL in two different flavors: mpiDL and TaskDL. For problems which require communication between nodes, mpiDL provides an interface to the MPI standard. If you are familiar with MPI, then mpiDL should be very easy to use. It can now be used with MPICH, MPICH2, or OpenMPI implementations of MPI. For problems where nodes do not need to communicate with each other, TaskDL is a task farming solution for IDL.

FastDL

Full disclosure: I work for Tech-X Corporation and did some work on this project.

Recently I had to find the neighbors in a 3 by 3 by 3 grid around a given point in 3D space. J.D. Smith’s Dimensional Juggling Tutorial has a lot of similar use of the REFORM and REBIN routines, but mostly for the purpose of “inflating” a dimension of an array. This problem called for cyclically repeating the same values in different patterns.

Grid

I needed the nearest neighbors (i.e. inside a 3 by 3 by 3 window), but other window sizes might be needed.

n = 3
offsets = lindgen(n) - (n - 1) / 2

So offsets is just [-1, 0, 1]. The magic is:

xoffsets = reform(rebin(offsets, n, n^2), n^3)
yoffsets = reform(rebin(offsets, n^2, n), n^3)
zoffsets = rebin(offsets, n^3)

which does the REFORM and REBIN in the opposite order from the Dimensional Juggling Tutorial. You expand into an extra dimension, then get rid of that extra dimension which creates the repeating patterns. For example, try

IDL> print, reform(rebin([-1, 0, 1], 3, 9), 27)
-1 0 1 -1 0 1 -1 0 1
-1 0 1 -1 0 1 -1 0 1
-1 0 1 -1 0 1 -1 0 1
IDL> print, reform(rebin([-1, 0, 1], 9, 3), 27)
-1 -1 -1 0 0 0 1 1 1
-1 -1 -1 0 0 0 1 1 1
-1 -1 -1 0 0 0 1 1 1
IDL> print, rebin([-1, 0, 1], 27)
-1 -1 -1 -1 -1 -1 -1 -1 -1
0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1

You can use these like:

neighbors = volume[x0 + xoffsets, y0 + yoffsets, z0 + zoffsets]

There is one particularity of IDL that is important in this problem. What happens when x0, y0, or z0 is 0 and you subtract 1 or more from it? Or if x0, y0, or z0 is the last valid value for a dimension and you add 1 to it? By default, IDL will not throw an error, but will use the nearest valid value. For example,

IDL> arr = findgen(5)
IDL> print, arr[[-1, 0, 3, 4, 5]]
0.00000 0.00000 3.00000 4.00000 4.00000

I never knew why this behavior would be desirable, but here it saves the day. By the way, you can turn it off with

compile_opt strictarrsubs

which then will cause an error on the same statement:

IDL> print, arr[[-1, 0, 3, 4, 5]]
% Array used to subscript array contains out of range subscript: ARR.
% Execution halted at: $MAIN$

For a recent project, I needed to compute a local mean and standard deviation for an array. Given a window size, the local mean for a given pixel location is the mean of all the elements around that pixel in the window. So the local mean is an array the same size as the original image. The local mean is simple to calculate:

kernel = fltarr(width, width) + 1.0
local_mean = convol(image, kernel) / width^2

for a window size of width. But I wasn’t sure if I could calculate the local standard deviation or other moments beyond the mean (without a loop) since they have a power of x[j] - mean[i] term. It turns out that you can with some tedious algebra; it’s fairly straightforward to calculate.

The IDL routine MOMENT can also calculate mean absolute deviation. I haven’t been able to come up with a way to compute a local mean absolute deviation without a loop.

Here is mg_local_moment.pro (doc).

Finding the n smallest elements of an array is a common problem with an informative solution. It is not necessary to sort the entire array to solve this problem. To solve it, I’m going to use HISTOGRAM in a slightly non-intuitive manner (or maybe it is intuitive if you’ve read J.D. Smith’s Histogram Tutorial)

Find the n smallest elements of a data set

The code quoted here is from a simplified MG_N_SMALLEST routine that finds only the smallest elements of an array (the actual code can also find the largest elements of the array).

function mg_n_smallest, data, n<br />compile_opt strictarr
  ; use histogram to find a set with more elements than n
  ; of smallest elements
  nData = n_elements(data)
  nBins = nData / n
  h = histogram(data, nbins=nBins, reverse_indices=ri)

  ; loop through the bins until we have n or more elements
  nCandidates = 0L
  for bin = 0L, nBins - 1L do begin
    nCandidates += h[bin]
    if (nCandidates ge n) then break
  endfor

  ; get the candidates and sort them
  candidates = ri[ri[0] : ri[bin + 1L] - 1L]
  sortedCandidates = sort(data[candidates])

  ; return the proper n of them
  return, (candidates[sortedCandidates])[0:n-1L]
end

This routine assumes that the data set is close to uniformly distributed. It uses a histogram to find a superset of candidates for the n smallest, then sorts them to find exactly the right ones. If the data is non-uniform, the sorting step will have to sort more than n elements, slowing the routine to the extent of the non-uniformity.

Let’s follow through the lines of code with a simple example data set.

IDL> data = randomu(seed, 10)
IDL> n = 3
IDL> print, data
0.115125 0.283748 0.942899 0.770475 0.818332
0.138929 0.175479 0.663068 0.806682 0.974345
IDL> nData = n_elements(data)
IDL> nBins = nData / n

The calculation of the histogram is (hopefully) where all the work will be done. (Remember, in bad cases, the SORT routine will take up the most time.)

IDL> h = histogram(data, nbins=nBins, reverse_indices=ri)
IDL> print, h
4 5 1
IDL> print, ri
4 8 13 14 0
1 5 6 2 3
4 7 8 9

The four elements in the first bin of the histogram are the candidates, but which elements are they? The output from the REVERSE_INDICES keyword of HISTOGRAM indicates which elements fall in each bin. It is an index array in two parts; what makes it confusing is the first part is an index into the second part. In our example here, the parts are divided as (ri below with the indices of the elements of ri above it):

0 1 2 3 | 4 5 6 7 8 9 10 11 12 13
4 8 13 14 | 0 1 5 6 2 3 4 7 8 9

The first part, 4 8 13 14, gives the starting indices of each of the three bins (plus a dummy last bin to indicate where the real last bin ends). These indices are indices into ri itself. So the first bin starts at index 4 (of ri), the second bin starts at index 8, and the third bin starts at index 13. The bins end one before the next bin starts (which is why the dummy last bin is there). So we have, the first bin in ri[4:7], the second bin in ri[8:12], and the third bin in ri[13:13]. Now, the “values” in the bins are actually indices into the original data set. So values in the first bin are:

IDL> print, ri[4:7]
0 1 5 6
IDL> print, d[ri[4:7]]
0.115125 0.283748 0.138929 0.175479

If you wanted to do this in general for bin i, it would be

ri[ri[i]:ri[i+1] - 1]

We’ll use this formula in a second. Back to our example, we now have the (four) candidates for the three smallest elements of the array. In the general case, there might not be n elements here, so we might have to include more bins than just the first one.

nCandidates = 0L
for bin = 0L, nBins - 1L do begin
  nCandidates += h[bin]
  if (nCandidates ge n) then break
endfor

This code will stop as soon as we have have enough candidates, i.e. n or more. We know how many bins (and hence how many candidates), but we don’t know which elements are the candidates yet. They are:

candidates = ri[ri[0] : ri[bin + 1L] - 1L]

We sort the candidates to find the smallest.

sortedCandidates = sort(data[candidates])

Now that we have sorted the candidates, just return the first n of them (remember that SORT returns indices into the original data array).

IDL> print, (candidates[sortedCandidates])[0:n-1L]
0 5 6

Here is the source code for MG_N_SMALLEST (docs).

The TEMPORARY routine is a simple way to use memory more efficiently. Statements of the form

a = a + 5

create a temporary variable to hold the intermediate result a + 5. This temporary variable effectively doubles the amount of memory needed to store a (but only for the time necessary to do the operation). This is not necessary if we know that the expression a + 5 will be reassigned to a. In this case, a itself can be used as the temporary variable. Use TEMPORARY to indicate to IDL which variable can be used as the temporary variable:

a = temporary(a) + 5

Now, adding 5 will happen “in place” in the memory that holds a. Incidently, for operations like the above, an even more efficient way to perform this calculation is:

a += 5

There are composite assignment forms like the above for all of IDL’s binary operators.

On the other hand, TEMPORARY does not help for expressions where the variable to be reassigned to is used multiple times like

a = a^3 + a^2

Neither a on the right-hand side of this statement can be marked temporary because marking a variable as temporary makes the variable undefined (until it is reassigned to again). In this case, if the a in a^3 was marked as temporary, then the a in a^2 would be undefined. (And vice versa.)

The argument of TEMPORARY must be a named variable to have an effect. So

a = temporary(a + 5)

will perform its arithmetic correctly, but will create a temporary variable in doing it.