Category "HPC"


mpiDL is a set of IDL bindings for the [Message Passing Interface (MPI)][MPI]. It is used for tasks where communication between processes is required, as opposed to the independent behavior of TaskDL workers. It can make use of the multiple cores of a single computer and/or multiple nodes of a cluster. mpiDL is supported on OS X and Linux for both OpenMPI and MPICH.

As an example of using mpiDL, I will present a simple probability-based computation of pi using many cores of a computer. If you are interesting in evaluating mpiDL or require more information about, please [contact me].

[MPI]: http://en.wikipedia.org/wiki/Message_Passing_Interface "Message Passing Interface"
[contact me]: http://michaelgalloy.com/about "Contact Michael Galloy"

Continue reading "mpiDL example."

TaskDL is a task farming library for IDL that allows you to farm out tasks using multiple cores of a single computer or even multiple computers. It is available on Linux, OS X, and Windows. Task farming is suitable for tasks which do *not* need to communicate with each other, i.e. "naturally" or "embarrassingly" parallel tasks, such as processing many files independently. For more complicated programs which required interprocess communication, mpiDL provides an interface to MPI (Message Passing Interface).

As an example of using TaskDL, I will present a program to compute some areas of the Mandelbrot set and create output files representing them. If you are interesting in evaluating TaskDL or require more information about, please contact [me].

[me]: http://michaelgalloy.com/about "About + contact"

Continue reading "TaskDL example."

I'm presenting ["Accelerated IDL using OpenCL"][poster] (IN53B-1563) Friday afternoon. This is a variety of work to accelerate IDL using HPC technologies such as adding OpenCL support for GPULib along with combining it with multi-core technologies such as TaskDL and mpiDL.

If you want to meet up the second half of the week, let me know!

[poster]: http://michaelgalloy.com/wp-content/uploads/2013/12/mgalloy-agu-2013.pdf "Accelerated IDL using OpenCL"

The Tech-X website was redesigned this year with an emphasis on our VORPAL based products, moving FastDL off the website but FastDL is still maintained and sold. I added Windows support and had made a few smaller updates for TaskDL this year.

![TaskDL][TaskDL logo]

TaskDL is a task farming library for naturally ("embarrassingly") parallel problems, such as processing a bunch of independent files on separate cores of a machine or multiple machines where there is little communication required between cores/machines.

For more information, see the [TaskDL Users Guide].

![mpiDL][mpiDL logo]

mpiDL is a set of IDL bindings for the Message Passing Interface ([MPI]). It requires knowledge of MPI (I would consider this a "difficult" interface) to use, but can handle general problems requiring individual working units to communicate with each other to perform their tasks.

For more information, see the [mpiDL Users Guide].

Contact me directly (or support at txcorp dot com) if you are interested!

[MPI]: http://en.wikipedia.org/wiki/Message_Passing_Interface "Message Passing Interface - Wikipedia"
[mpiDL Users Guide]: http://michaelgalloy.com/wp-content/uploads/2013/11/mpiDL_UsersGuide.pdf "mpiDL Users Guide"
[TaskDL Users Guide]: http://michaelgalloy.com/wp-content/uploads/2013/11/TaskDL_UsersGuide.pdf "TaskDL Users Guide"

[mpiDL logo]: http://michaelgalloy.com/wp-content/uploads/2013/11/mpiDL.png "mpiDL"
[TaskDL logo]: http://michaelgalloy.com/wp-content/uploads/2013/11/TaskDL.png "TaskDL"

*Full disclosure: I am an employee of Tech-X and the product manager for FastDL.*

One of the features I'm most excited about in GPULib 1.6 is the ability to create your own kernel and call it from within the GPULib framework. Details and an example are on the [GPULib blog][custom kernels]. This feature requires CUDA programming knowledge, but makes it much easier to integrate custom CUDA code into IDL.

You can [download] a free version of GPULib which has basic features and includes a short term license for the full features of GPULib as well. Calling custom CUDA kernels requires the full version of GPULib.

*Full disclosure: I work for Tech-X Corporation and I am the product manager for GPULib.*

[custom kernels]: http://gpulib.blogspot.com/2013/04/custom-kernels-in-gpulib-16.html "Custom kernels in GPULib 1.6"
[download]: http://www.txcorp.com/get-gpulib-software "Get GPULib"

[GPULib] 1.6 has greatly enhanced linear algebra capabilities: GPU accelerated LAPACK routines provided by [MAGMA]. See [details][linear algebra] on the GPULib blog. We provide a low-level interface to over 100 LAPACK routines.

MAGMA is a hybrid code that uses the CPU to do parts of the calculations that are best suited to it. We use Intel MKL to provide the CPU LAPACK.

MAGMA has been difficult to build, but I'm happy to say we have builds for OS X, Linux, and Windows!

*Full disclosure: I work for Tech-X Corporation and I am the product manager for GPULib.*

[linear algebra]: http://gpulib.blogspot.com/2013/05/linear-algebra-in-gpulib-16.html "Linear algebra in GPULib 1.6"
[MAGMA]: http://icl.cs.utk.edu/magma/ "MAGMA"
[GPULib]: http://www.txcorp.com/home/gpulib "GPULib home"

After a long wait, [GPULib] 1.6 is finally ready to [download]! Here's the brief version of the release notes (for a more detailed list, see the [GPULib blog]):

- All platforms, Windows, Linux, and OS X, are now distributed as
binaries. No building from source required!
- Added MAGMA (GPU accelerated LAPACK library) linear algebra routines.
- GPULib can now load and execute custom CUDA kernels without having to link to it; you just compile your kernel to a `.ptx` file. We provide routines to load and execute that kernel at runtime.
- Support for CUDA 5.0.
- Added support for up to 8-dimensional arrays.
- Added optimized scalar/array operations.
- Miscellaneous bug fixes.

A lot of work was done on infrastructure to make releasing an easier process, hopefully resulting in more frequent releases. We have plans for some very exciting features in the coming year!

*Full disclosure: I work for Tech-X Corporation and I am the product manager for GPULib.*

[GPULib]: http://www.txcorp.com/home/gpulib "GPULib home"
[download]: http://www.txcorp.com/get-gpulib-software "Get GPULib"
[GPULib blog]: http://gpulib.blogspot.com/2013/05/gpulib-16-released.html "GPULib 1.6 released"

I've been doing some [experiments with OpenCL][Experiments with OpenCL] lately. Long story short: I can't convert GPULib over yet, but I don't think it will be too long before the libraries are ready.

[Experiments with Opencl]: http://gpulib.blogspot.com/2012/08/experiments-with-opencl.html "Experiments with OpenCL"

I had to do a lot of line profiling (with the `-l` option to [gprof]) of some Fortran code recently and got tired of tracking through source code to find the lines that where causing problems. The line profiler gives very useful [output][gprof output] that looks like (edited to remove some extra space):

percent cumulative self
time seconds seconds name
18.20 0.02 0.02 main (cuda-blas.cu:94 @ 401502)
9.10 0.03 0.01 main (cuda-blas.cu:93 @ 4014c9)
9.10 0.04 0.01 main (cuda-blas.cu:166 @ 4018c0)
9.10 0.05 0.01 main (cuda-blas.cu:239 @ 401c3d)
9.10 0.06 0.01 main (cuda-blas.cu:243 @ 401c91)
9.10 0.07 0.01 main (cuda-blas.cu:318 @ 402039)
9.10 0.08 0.01 main (cuda-blas.cu:319 @ 402060)
9.10 0.09 0.01 main (cuda-blas.cu:322 @ 4020a1)
9.10 0.10 0.01 main (cuda-blas.cu:321 @ 4020f0)

I wrote an IDL routine that takes the raw profile output along with the source code and creates [HTML output][cuda-blas profile] that color codes the lines with high activity, like the following:

Get the code from my IDL library, available via Subversion:

svn co http://svn.idldev.com/idllib/trunk idllib

The `mg_clineprofile.pro` file is in the `src/profiling` directory. Call `MG_CLINEPROFILE` like:

IDL> mg_clineprofile, 'profile-output.txt', /all_files

The `ALL_FILES` keyword indicates you want output for each file listed in the profile output; you can also specify the files that you want output for via the `FILES` keyword.

The code is currently pretty ugly; expect changes to it coming up.

[gprof]: http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html#SEC1 "gprof description"
[gprof output]: http://michaelgalloy.com/wp-content/uploads/2012/04/cuda-blas.lineprofile.txt "gprof output"
[cuda-blas profile]: http://michaelgalloy.com/wp-content/uploads/2012/04/cuda-blas.cu.html "cuda-blas profile"
[line-profiling]: http://michaelgalloy.com/wp-content/uploads/2012/04/line-profiling.png "Line profiling output"

[StackExchange][sa], a network of collaborative question and answer sites, has opened a new site for [computational science][sa-scicomp]. Some questions on the site now:

1. Is it possible to dynamically resize a sparse matrix in the Petsc library?
2. Future of OpenCL?
3. Parallel I/O options, in particular parallel HDF5
4. Is it worthwhile to write unit tests for scientific research codes?

[sa]: http://stackexchange.com/ "StackExchange"
[sa-scicomp]: http://scicomp.stackexchange.com/ "HPC on StackExchange"

« newer postsolder posts »