Greg Wilson gave a great talk about Software Carpentry at SciPy this year. I think more efforts like the Software Carpentry seminars are greatly needed in science — I’ve mentioned Software Carpentry several times before.

If you are interested in teaching, he highly recommends the book How Learning Works. It gives a summary of the current research in learning with links to the primary sources. I wish I had that when I was teaching.

via Astronomy Computing Today

Maps of floating pastic

The National Geographic has created new maps showing the extent of floating plastic in the ocean:

Tens of thousands of tons of plastic garbage float on the surface waters in the world’s oceans, according to researchers who mapped giant accumulation zones of trash in all five subtropical ocean gyres. Ocean currents act as “conveyor belts,” researchers say, carrying debris into massive convergence zones that are estimated to contain millions of plastic items per square kilometer in their inner cores.

Two ships covered in the world in nine months to collect this data.

via FlowingData

When writing even small applications, it is often necessary to distribute resource files along with your code. For example, images and icons are frequently needed by GUI applications. Custom color table files or fonts might be needed by applications that create visualizations. Defaults might be stored in other data files. But how do you find these files, when the user could have installed your application anywhere on their system?

Continue reading “Finding your resource files.”

ExelisVIS annonuced VISualize 2014 will focus on the following topics:

Presentations and discussions will focus on topics such as:

  • Using new data platforms such as UAS, microsatellites, and SAR sensors
  • Remote sensing solutions for precision agriculture
  • Drought, flood, and extreme precipitation event monitoring and assessment
  • Wildfire and conservation area monitoring, management, mitigation, and planning
  • Monitoring leaks from natural gas pipelines

See the video for more information and then register or submit an abstract.

UPDATE 9/18/14: postponed until 2015.

I’ve been dealing with HDF 5 files for quite awhile, but IDL interface was as painful as the C interface. It did have H5_BROWSER and H5_PARSE to make things a bit easier, but these utilities are relevant for interactive browsing of a dataset and not for efficient, programmatic access. I created a set of routines for dealing with HDF 5 files that I have been extending as needed to other scientific data formats such as netCDF, HDF 4, and IDL Savefiles.

Continue reading “Scientific data file format routines.”

Recently, I have been writing a fairly large and generic system for ingesting various satellite images onto a common grid and producing user specified plots and reports from the results. Control of the system is done via a configuration file, like this one, which has been a great, flexible way to handle users extending and controlling the system. But reading the recent IDL Data Point article about Jim Pendleton’s DebuggerHelper class reminded me how useful a logging framework is for medium to large sized projects.

I use mg_log as my logging utility. It is simple to use, but has some powerful features for filtering and customizing output. It has five levels (debug, informational, warning, error, and critical) of messages which match the overall system level. This allows you to filter messages based on severity, e.g., during development you can set the logging level to “debug” and then all messages will appear. Later, when you have deploy the system, users may find setting the level to “warning” (which does not show the debug and informational messages) to be more appropriate.

See this article to learn more about mg_log basics. This sample log shows what the typical output looks like, though the format for each line is completely configurable. mg_log is available on GitHub in my mglib repo.

I have had multiple occasions where I needed to quickly generate bindings to an existing C library. The repetitive nature of creating these bindings calls out for a tool to automate this tool. For this purpose, I have written a class, MG_DLM, that allows:

  1. creating wrapper binding for routines from a header prototype declaration (with some limitations from standard C)
  2. creating routines which access variables and pound defines
  3. allow adding custom routines written by the developer

I have used MG_DLM to create bindings for the GNU Scientific Library (GSL), CULA, MAGMA, and even IDL itself.

Continue reading “Automatically generating IDL bindings.”

Atle Borsholm recently posted a clever solution for finding the n-th smallest element in an array on the IDL Data Point. He compares this to a naive solution which simply sorts all the elements and grabs the n-th element:

IDL> tic & x = ordinal_1(a, 123456) & toc
% Time elapsed: 3.0336552 seconds.

His solution performs much better:

IDL> tic & x = ordinal_2(a, 123456) & toc
% Time elapsed: 0.46286297 seconds.

I have a HISTOGRAM-based solution called MG_N_SMALLEST in mglib that can do even better:

IDL> tic & x = mg_n_smallest(a, 123456) & toc
% Time elapsed: 0.18394303 seconds.

Note: MG_N_SMALLEST does not return the n-th smallest element directly, but returns indices to the smallest n elements.

I have a more detailed description of what MG_N_SMALLEST is doing in an older article. I like this routine as a good example of using HISTOGRAM and its REVERSE_INDICES keyword. It also a nice example of when using a FOR loop in IDL isn’t so bad.

I include even more detail on this routine in the “Performance” chapter of my book.

Mike Bostock has created some really great visualizations of sampling, shuffling, sorting, and maze generation algorithms. He ends with a quick discussion of using vision to think, using his NYTimes interactive graphic “Is It Better to Rent or Buy?” as an example:

To fix this, we need to do more than output a single number. We need to show how the underlying system works. The new calculator therefore charts every variable and lets you quickly explore any variable’s effect by adjusting the associated slider.

via Flowing Data

My MacBook Pro has three OpenCL devices: a CPU, an integrated GPU, and a discrete GPU. I was interested in the performance I could get with my OpenCL GPULib prototype on the various devices, so I ran the benchmark routine on each of them. CL_BENCHMARK simply computes the gamma function for an array of values; see the results for various array sizes below.

Gamma computation performance on host and various OpenCL devices

There are several interesting points to these results:

  1. the discrete GPU did not have the best performance
  2. the CPU OpenCL device performed better than the host, i.e., the CPU, for more than a few million elements

Contact me if you are interested in the GPULib OpenCL prototype (still very rough).

Here’s the details on the various OpenCL devices on my laptop:

IDL> cl_report
Platform 0
Name: Apple
Version: OpenCL 1.2 (Apr 25 2014 22:04:25)

  Device 0
  Name: Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
  Global memory size: 17179869184 bytes (16384 MB)
  Double capable: yes
  Available: yes
  Compiler available: yes
  Device version: OpenCL 1.2
  Driver version: 1.1

  Device 1
  Name: Iris Pro
  Global memory size: 1610612736 bytes (1536 MB)
  Double capable: no
  Available: yes
  Compiler available: yes
  Device version: OpenCL 1.2
  Driver version: 1.2(May  5 2014 20:39:23)

  Device 2
  Name: GeForce GT 750M
  Global memory size: 2147483648 bytes (2048 MB)
  Double capable: yes
  Available: yes
  Compiler available: yes
  Device version: OpenCL 1.2
  Driver version: 8.26.21 310.40.35f08

« newer postsolder posts »