Category "HPC"

Archived posts from category "HPC"

mpiDL example

posted Tue 11 Mar 2014 by Michael Galloy under HPC, IDL

mpiDL is a set of IDL bindings for the Message Passing Interface (MPI). It is used for tasks where communication between processes is required, as opposed to the independent behavior of TaskDL workers. It can make use of the multiple cores of a single computer and/or multiple nodes of a cluster. mpiDL is supported on OS X and Linux for both OpenMPI and MPICH.

As an example of using mpiDL, I will present a simple probability-based computation of pi using many cores of a computer. If you are interesting in evaluating mpiDL or require more information about, please contact me.

Setting up for using mpiDL requires placing the mpiDL lib/ directory in your IDL path and DLM path, as well as setting the MPIDL_DIR environment variable to the root of your mpiDL distribution (make sure to do it in .bashrc to make it available to non-interactive processes). Then you are ready to run an mpiDL program, such as the parallel_pi example provided in the distribution:

examples$ runmpidl -np 8 ${PWD}/parallel_pi.sav
Running mpiexec -np 8 /home/research/mgalloy/software/mpiDL-2.4.0-Linux64/bin/mpidlstart
/home/research/mgalloy/software/mpidl-r431-par/examples/parallel_pi.sav...
******************************************************************
MPIDL Version 2.4.0 - a parallel implementation of IDL
(C) Copyright 2000 - 2014, Tech-X Corp.
All rights reserved.
******************************************************************
Process 1 gave: 3.143600
Process 2 gave: 3.131600
Process 3 gave: 3.137200
Process 4 gave: 3.116400
Process 5 gave: 3.140000
Process 6 gave: 3.172800
Process 7 gave: 3.157200
The final value of pi is: 3.142686

The -np 8 argument indicates that the example should run with 8 processes; the parallel_pi example code has decided that it will have one master process collecting results and 7 workers computing pi. The example can also run with a single process, where that process both computes pi and collects the result:

examples$ runmpidl -np 1 ${PWD}/parallel_pi.sav
Running mpiexec -np 1 /home/research/mgalloy/software/mpiDL-2.4.0-Linux64/bin/mpidlstart
/home/research/mgalloy/software/mpidl-r433-434M-par/examples/parallel_pi.sav...
Process 0 gave: 3.142200
The final value of pi is: 3.142200

Also, note that a .sav file is specified. This allows an each process to use a runtime IDL license instead of a full development license.

There are two routines in the source code: parallel_pi_calcpi, which does not know about MPI and just computes an estimate of pi, and parallel_pi itself which coordinates the work using the MPI interface. By the nature of MPI, the same routine is executed in each process, but the processes are given an identifier, called the “rank”, which lets the routine decide what it should be doing. In parallel_pi, we determine that with the following code:

rank = mpidl_comm_rank()
nprocs = mpidl_comm_size()

Here, nprocs is the total number of processes and rank is an identifier from 0 to nprocs - 1. From this information, parallel_pi can determine if it will be computing pi and sending the results back or if it is the receiver of the information:

am_a_sender = (nprocs eq 1L) or (rank gt 0L)
am_a_receiver = (nprocs eq 1L) or (rank eq 0L)

This is more complicated than might seem to be required in order to handle the case of only one process which would need to be both a sender and receiver.

If it is a sender, the process must compute pi and send it back to the receiver (the rank 0 process referenced by DEST in the mpidl_send routine below):

if (am_a_sender) then begin
  seedr = rank
  a = dblarr(1)
  a[0] = parallel_pi_calcpi(neval, seedr)
  mpidl_send, a, DEST=0
endif

If it is the receiver, it allocates an array to hold the result from each sender and then receives the result from each sender in turn with mpidl_recv. Finally, it computes the average of the values to give the result:

if (am_a_receiver) then begin
  mypi = dblarr(n_senders)
  for j = 0L, n_senders - 1L do begin
    mypi[j] = mpidl_recv(COUNT=1, SOURCE=senders[j], /DOUBLE)
    print, senders[j], mypi[j], format='(%"Process %d gave: %f")'
  endfor
  print, total(mypi, /preserve_type) / n_senders, $
         format='(%"The final value of pi is: %f")'
endif

Check out the source code for all the details. Also, download the users guide for more information.

Full disclosure: I work for Tech-X and I am the product manager for the FastDL suite which includes mpiDL.

No Comments

TaskDL example

posted Tue 25 Feb 2014 by Michael Galloy under HPC, IDL

TaskDL is a task farming library for IDL that allows you to farm out tasks using multiple cores of a single computer or even multiple computers. It is available on Linux, OS X, and Windows. Task farming is suitable for tasks which do not need to communicate with each other, i.e. “naturally” or “embarrassingly” parallel tasks, such as processing many files independently. For more complicated programs which required interprocess communication, mpiDL provides an interface to MPI (Message Passing Interface).

As an example of using TaskDL, I will present a program to compute some areas of the Mandelbrot set and create output files representing them. If you are interesting in evaluating TaskDL or require more information about, please contact me.

The first program needed when using TaskDL is the compute task program that will be called, in our case this will be called mandelbrot_compute.pro. It is a normal IDL program that typically does not need to know about TaskDL and normally just places output in files. Our mandelbrot_compute.pro example has the following interface:

pro mandelbrot_compute, x_range, y_range, nx, ny, $
                        max_iterations=max_iterations, $
                        bound=bound, $
                        color_table=color_table, $
                        image_file=image_file, $
                        data_file=data_file, $
                        uniform_color=uniform_color

The driver of this program, mandelbrot.pro, is in charge of setting up the task farm, creating the tasks, and sending them off to the workers. To begin, creating a TaskDL object and open a new session on a particular host and port:

oFarm = obj_new('TaskDL', _extra=e)
oFarm->open_session, host=server_host, port=server_port

To run locally, server_host would simply be localhost. Port is typically just any unused port.

TaskDL has optimizations for running locally, use the ::spawn_local_worker and ::spawn_worker methods as needed to create as many workers as required, typically matching the number of processing units (cores or nodes) as available:

for w = 0L, n_workers - 1L do begin
  if (keyword_set(local)) then begin
    ofarm->spawn_local_worker
  endif else begin
    ofarm->spawn_worker, host=_worker_host[w mod n_hosts]
  endelse
endfor

The commands, as strings, to be sent to the workers much be constructed. This construction and the ::add_task call would typically be done in a loop, in our example, over the number of zoom levels desired:

cmd_format = '(%"mandelbrot_compute, %s, %s, %s, %s, ' $
               + 'max_iterations=%s, ' $
               + 'image_file=''%s'', data_file=''%s''")'
cmd = string(x_range_str, $
             y_range_str, $
             nx_str, $
             ny_str, $
             max_iterations_str, $
             image_file, $
             data_file, $
             format=cmd_format)
ofarm->add_task, cmd, queueid=0, stage=1

Multiple queues can be created associated with specific workers, but in our simple example we use the default queue. Stages provide the ability to require work to progress in stages, i.e., all stage 1 tasks must complete before stage 2 tasks start, etc. Again, that is not needed for our example.

When done, close the TaskDL session:

ofarm->close_session

Output is placed in mandelbrot-[zoom_level].png and mandelbrot-[zoom_level].nc files.

Full disclosure: I work for Tech-X and I am the product manager for the FastDL suite which includes TaskDL.

UPDATE 3/24/2014: Here is the TaskDL Users Guide.

No Comments

AGU 2013

posted Tue 10 Dec 2013 by Michael Galloy under HPC, IDL

I’m presenting “Accelerated IDL using OpenCL” (IN53B-1563) Friday afternoon. This is a variety of work to accelerate IDL using HPC technologies such as adding OpenCL support for GPULib along with combining it with multi-core technologies such as TaskDL and mpiDL.

If you want to meet up the second half of the week, let me know!

No Comments

FastDL: parallel computing in IDL

posted Thu 14 Nov 2013 by Michael Galloy under HPC, IDL

The Tech-X website was redesigned this year with an emphasis on our VORPAL based products, moving FastDL off the website but FastDL is still maintained and sold. I added Windows support and had made a few smaller updates for TaskDL this year.

TaskDL

TaskDL is a task farming library for naturally (“embarrassingly”) parallel problems, such as processing a bunch of independent files on separate cores of a machine or multiple machines where there is little communication required between cores/machines.

For more information, see the TaskDL Users Guide.

mpiDL

mpiDL is a set of IDL bindings for the Message Passing Interface (MPI). It requires knowledge of MPI (I would consider this a “difficult” interface) to use, but can handle general problems requiring individual working units to communicate with each other to perform their tasks.

For more information, see the mpiDL Users Guide.

Contact me directly (or support at txcorp dot com) if you are interested!

Full disclosure: I am an employee of Tech-X and the product manager for FastDL.

No Comments

Custom kernels in GPULib 1.6

posted Wed 8 May 2013 by Michael Galloy under HPC, IDL

One of the features I’m most excited about in GPULib 1.6 is the ability to create your own kernel and call it from within the GPULib framework. Details and an example are on the GPULib blog. This feature requires CUDA programming knowledge, but makes it much easier to integrate custom CUDA code into IDL.

You can download a free version of GPULib which has basic features and includes a short term license for the full features of GPULib as well. Calling custom CUDA kernels requires the full version of GPULib.

Full disclosure: I work for Tech-X Corporation and I am the product manager for GPULib.

No Comments

Linear algebra in GPULib 1.6

posted Fri 3 May 2013 by Michael Galloy under HPC, IDL

GPULib 1.6 has greatly enhanced linear algebra capabilities: GPU accelerated LAPACK routines provided by MAGMA. See details on the GPULib blog. We provide a low-level interface to over 100 LAPACK routines.

MAGMA is a hybrid code that uses the CPU to do parts of the calculations that are best suited to it. We use Intel MKL to provide the CPU LAPACK.

MAGMA has been difficult to build, but I’m happy to say we have builds for OS X, Linux, and Windows!

Full disclosure: I work for Tech-X Corporation and I am the product manager for GPULib.

[2] Comments

GPULib 1.6 release

posted Wed 1 May 2013 by Michael Galloy under HPC, IDL

After a long wait, GPULib 1.6 is finally ready to download! Here’s the brief version of the release notes (for a more detailed list, see the GPULib blog):

All platforms, Windows, Linux, and OS X, are now distributed as binaries. No building from source required!
Added MAGMA (GPU accelerated LAPACK library) linear algebra routines.
GPULib can now load and execute custom CUDA kernels without having to link to it; you just compile your kernel to a .ptx file. We provide routines to load and execute that kernel at runtime.
Support for CUDA 5.0.
Added support for up to 8-dimensional arrays.
Added optimized scalar/array operations.
Miscellaneous bug fixes.

A lot of work was done on infrastructure to make releasing an easier process, hopefully resulting in more frequent releases. We have plans for some very exciting features in the coming year!

Full disclosure: I work for Tech-X Corporation and I am the product manager for GPULib.

1 Comment

OpenCL in GPULib

posted Mon 27 Aug 2012 by Michael Galloy under HPC, IDL

I’ve been doing some experiments with OpenCL lately. Long story short: I can’t convert GPULib over yet, but I don’t think it will be too long before the libraries are ready.

No Comments

★ Better line profiling output

posted Fri 27 Apr 2012 by Michael Galloy under HPC

I had to do a lot of line profiling (with the -l option to gprof) of some Fortran code recently and got tired of tracking through source code to find the lines that where causing problems. The line profiler gives very useful output that looks like (edited to remove some extra space):

percent cumulative self
time seconds seconds name
18.20 0.02 0.02 main (cuda-blas.cu:94 @ 401502)
9.10 0.03 0.01 main (cuda-blas.cu:93 @ 4014c9)
9.10 0.04 0.01 main (cuda-blas.cu:166 @ 4018c0)
9.10 0.05 0.01 main (cuda-blas.cu:239 @ 401c3d)
9.10 0.06 0.01 main (cuda-blas.cu:243 @ 401c91)
9.10 0.07 0.01 main (cuda-blas.cu:318 @ 402039)
9.10 0.08 0.01 main (cuda-blas.cu:319 @ 402060)
9.10 0.09 0.01 main (cuda-blas.cu:322 @ 4020a1)
9.10 0.10 0.01 main (cuda-blas.cu:321 @ 4020f0)

I wrote an IDL routine that takes the raw profile output along with the source code and creates HTML output that color codes the lines with high activity, like the following:

Line profiling

Get the code from my IDL library, available via Subversion:

svn co http://svn.idldev.com/idllib/trunk idllib

The mg_clineprofile.pro file is in the src/profiling directory. Call MG_CLINEPROFILE like:

IDL> mg_clineprofile, 'profile-output.txt', /all_files

The ALL_FILES keyword indicates you want output for each file listed in the profile output; you can also specify the files that you want output for via the FILES keyword.

The code is currently pretty ugly; expect changes to it coming up.

No Comments

Computational science on StackExchange

posted Tue 13 Dec 2011 by Michael Galloy under HPC

StackExchange, a network of collaborative question and answer sites, has opened a new site for computational science. Some questions on the site now:

Is it possible to dynamically resize a sparse matrix in the Petsc library?
Future of OpenCL?
Parallel I/O options, in particular parallel HDF5
Is it worthwhile to write unit tests for scientific research codes?

No Comments

« newer posts — older posts »

michaelgalloy.com

Resources for IDL developers

Buy Modern IDL now!

Modern IDL offers IDL programmers one place to look for explanation, techniques, and reference material, for beginners and advanced users alike.

"... But I've always wanted a thorough, concise, up-to-date overview of the the IDL language and its vast capabilities. This is exactly what Mike's book provides in 464 very informative pages... Highly recommended!"
—Mort Canty

About me

I'm a software developer focusing on high-performance computing and visualization in scientific programming. I work mostly in IDL, but occasionally use C, CUDA, and Python.

I currently work for the National Center for Atmospheric Research (NCAR) at the Mauna Loa Solar Observatory. Previously, I worked for Tech-X Corporation, where I was the main developer for GPULib, a library of IDL bindings for GPU accelerated computation routines.

I am the creator and main developer for the open source projects IDLdoc, mgunit, and rIDL.

Contact me at Mastodon or via email at mgalloy at gmail dot com. For more details about me, see my CV/resume.

Need consulting/instruction? Contact me.

Category "HPC"

mpiDL example

TaskDL example

AGU 2013

FastDL: parallel computing in IDL

Custom kernels in GPULib 1.6

Linear algebra in GPULib 1.6

GPULib 1.6 release

OpenCL in GPULib

★ Better line profiling output

Computational science on StackExchange

michaelgalloy.com

Buy Modern IDL now!

About me

Other

Feeds

GPULib

TaskDL

mpiDL

Remote Data Toolkit

Modern IDL

IDLdoc

mgunit

rIDL

mglib