I think “machine learning” in this paper applies fairly well to any type of scientific pipeline code:

Using the framework of technical debt, we note that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning.

The authors argue that machine learning systems have the regular issues of a code, but also have other complexities that are not necessary addressed in the normal way of refactoring libraries, adding unit tests, etc.

IDL 8.6.1 was released today1. Some interesting new features:

  • Conditional breakpoints from the Workbench
  • Hexadecimal constants, e.g., a = 0xFF3A
  • Fix for strings that begin with numerals being confused with the octal notation: "123 is an octal value; "123" used to be a syntax error, but is now a valid string.

See the release notes for details.


  1. Really sometime in the last week or so. The announcement on the newsgroup was today, but the release notes was posted 7/27. 

Travis Oliphant, creator of NumPy the array package for Python, wrote a analog to the Zen of Python for NumPy:

Strided is better than scattered
Contiguous is better than strided
Descriptive is better than imperative (use data-types)
Array-oriented is often better than object-oriented
Broadcasting is a great idea — use where possible
Vectorized is better than an explicit loop
Unless it’s complicated — then use numexpr, weave, or Cython
Think in higher dimensions

I tried something for IDL last year.

I will be posting more about the Great American Eclipse of 2017 this summer. To start off with, NASA has made some great maps of where the total eclipse will be visible from:

On August 21, 2017, the moon will pass between Earth and the sun in a total solar eclipse that will be visible on a path from Oregon to South Carolina across the continental United States. This path of totality will occur in a little over 90 minutes, while observers on the ground will see the eclipse for about two and a half minutes. Standing at the edge of the moon’s shadow, or umbra, the difference between seeing a total eclipse and a partial eclipse comes down to elevation – mountains and valleys both on Earth and on the moon – which affect where the shadow lands. In this visualization, data from NASA’s Lunar Reconnaissance Orbiter account for the moon’s terrain that creates a jagged edge on its shadow. This data is then combined with elevation data on Earth as well as information on the sun angle to create the most accurate map of the eclipse path to date. Watch the video to learn more.

NASA also has downloadable, detailed state maps of the eclipse path.

via kottke

Here are some odd, but totally legal, IDL statements. I would suggest staying away from all of them and use more conventional syntax.

Here’s one that looks like a string, but is really specifying an octal value:

IDL> x = "12
IDL> help, x
X               INT       =       10

Back when octal values were much more important than now, I suppose it made some sense to have special syntax for entering them. In modern times, I would suggest x = '12'o.

This also means that if you are specifying a string that begins with digits 0-7 with double quotes, you can generate a bewildering syntax error:

IDL> y = "12 monkeys"
              ^
% Syntax error.

I recommend using single quotes for all strings, i.e., y = '12 monkeys'.

Next up is a convenience for the truly lazy:

IDL> s = 'some string

You don’t have to put the trailing single or double quote on a string if it is the last character on the line. This will probably make your text editor’s syntax highlighting confused. One character is not too much to type for some clarity.

Finally, I just saw this one last week:

IDL> for i = 0, 4 do begin y = i & print, i
       0
       1
       2
       3
       4

There are quite a few problems with this:

  • There is a begin with no matching end!
  • There is not a & after the begin even though you would normally have to start a new line there.
  • I would recommend against using &. It can be useful on the command line (it makes it easier to up arrow to a previous set of commands), but don’t do it in a file!

This is the standard syntax for that line (if you really need to put it all on a single line):

IDL> for i = 0, 4 do begin & y = i & print, i & endfor

I might count using parentheses for indexing arrays as a syntax oddity as well, but there are so many IDL programmers still doing it that it counts as commonplace. I still recommend against it.

Excellent rundown of all the horrible rules that organizations impose on your passwords:

  • They don’t work.
  • They heavily penalize your ideal audience, people that use real random password generators. Hey guess what, that password randomly didn’t have a number or symbol in it. I just double checked my math textbook, and yep, it’s possible. I’m pretty sure.
  • They frustrate average users, who then become uncooperative and use “creative” workarounds that make their passwords less secure.
  • They are often wrong, in the sense that the rules chosen are grossly incomplete and/or insane, per the many shaming links I’ve shared above.
  • Seriously, for the love of God, stop with this arbitrary password rule nonsense already. If you won’t take my word for it, read this 2016 NIST password rules recommendation. It’s right there, “no composition rules”. However, I do see one error, it should have said “no bullshit composition rules”.

My personal pet peeve is forced expiration for no reason. NIST is developing guidelines.

The Mathematics Genealogy Project is an amazing effort to record basic information about every mathematician in the world. We can create a family tree for any mathematician. Here is my tree:

For a description of how to create the graph of another mathematician’s genealogy, see Dana C. Ernst’s article.

Licensing has been the most controversial change in IDL 8.6. The release notes say:

IDL licensing is now managed through a 3rd-party solution from Flexera software. You obtain the license through a portal hosted by Flexera, then you can choose to activate the license on a license server or on an individual node-locked machine.

This seems like a more convenient solution, but there are a lot of other changes in the licensing for IDL 8.6.

Limits have been placed on the number of instances of IDL running on a machine. For a local (node-locked) license, the number of IDL instances is limited by:

  • IDL command line or Workbench – 4
  • Execute compiled save code – 4
  • IDL Bridge Processes – 16
  • IDL Task Engine – 1

For a served (floating) license:

  • IDL command line or Workbench – 1
  • Execute compiled .sav code – 1
  • IDL Bridge Processes – 8
  • IDL Task Engine – 1

The flexible single user license which allowed people to use IDL at work and at home (or lab) with a single license has also been eliminated in IDL 8.6.

Furthermore, the IDL 8.6 Virtual Machine cannot currently be downloaded from the Harris site. In the past, this has allowed IDL developers to release applications to users who did not need the full IDL distribution, or an IDL license, to run the application.

Complaints resulted in a proposed change for the next release of IDL. IDL Project Lead, Chris Torrence wrote on Feb 1:

Starting with IDL 8.6.1 (hopefully mid-April), we will make the following changes:

  • An IDL user will be able to run an unlimited number of sessions on their machine. In IDL 8.6 the IDL license was tied to the MAC address + install location + process ID, so each process ID would consume a separate license. In IDL 8.6.1, the IDL license will be tied to the MAC address + install location + user id, so multiple process ID’s will consume just a single IDL license.
  • This change will apply to the IDL command line, the IDL Workbench, and the Python bridge, on all platforms.
  • This change will not apply to ENVI or other Harris Geospatial products.
  • There is no policy change for “flexible single user” (other than allowing multiple IDL sessions on one machine). If you need to use IDL on two machines, you should contact Tech Support or your sales rep for options.
  • IDL Virtual Machine will remain unchanged from pre-IDL 8.6 – we just need to tie up some lose ends and release it.

IDL 8.6 also has an automatic check for updates (you can turn off with the “IDL_UPDATE_CHECK” preference) that will tell you when an update is available.

Harris Geospatial released IDL 8.6 at some point in the last couple months—it’s hard to pick an actual day. I’ve heard the release was rolled out to customers in batches since then and it was finally my turn last Friday!

The release notes list the new features. I am very interested in checking out the IDL Task Engine; I think it will be extremely useful. There are quite a few small features and changes that I think I will regularly use.

I will have more details in the coming weeks as I look at individual new features one by one.

I have been doing some reading about machine learning recently, using Python as an implementation language. I lot of the routines used are fairly easy to implement in IDL, so I have started filling out my library with IDL versions.

I have written a scatter plot matrix routine that takes a collection of vectors and makes all the scatter plots between pairs of them. For example, here’s a scatter plot matrix produced by the routine for the classic iris dataset:

If you want to use the routine, it’s probably easiest to clone my entire library.

older posts »