Technical debt in machine learning

posted Tue 8 Aug 2017 by Michael Galloy under Machine learning

I think “machine learning” in this paper applies fairly well to any type of scientific pipeline code:

Using the framework of technical debt, we note that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning.

The authors argue that machine learning systems have the regular issues of a code, but also have other complexities that are not necessary addressed in the normal way of refactoring libraries, adding unit tests, etc.

No Comments

A Visual Introduction to Machine Learning

posted Thu 5 May 2016 by Michael Galloy under Machine learning

A Visual Introduction to Machine Learning provides a simple, visual explanation of using decision trees:

Recap:

Machine learning identifies patterns using statistical learning and computers by unearthing boundaries in data sets. You can use it to make predictions.

One method for making predictions is called a decision trees, which uses a series of if-then statements to identify boundaries and define patterns in the data

Overfitting happens when some boundaries are based on on distinctions that don’t make a difference. You can see if a model overfits by having test data flow through the model.

Decision tree

The post, which is already ten months old, promises a second in the series on overfitting, but this short introduction is a fine standalone.

No Comments

michaelgalloy.com

Resources for IDL developers

Buy Modern IDL now!

Modern IDL offers IDL programmers one place to look for explanation, techniques, and reference material, for beginners and advanced users alike.

"... But I've always wanted a thorough, concise, up-to-date overview of the the IDL language and its vast capabilities. This is exactly what Mike's book provides in 464 very informative pages... Highly recommended!"
—Mort Canty

About me

I'm a software developer focusing on high-performance computing and visualization in scientific programming. I work mostly in IDL, but occasionally use C, CUDA, and Python.

I currently work for the National Center for Atmospheric Research (NCAR) at the Mauna Loa Solar Observatory. Previously, I worked for Tech-X Corporation, where I was the main developer for GPULib, a library of IDL bindings for GPU accelerated computation routines.

I am the creator and main developer for the open source projects IDLdoc, mgunit, and rIDL.

Contact me at Mastodon or via email at mgalloy at gmail dot com. For more details about me, see my CV/resume.

Need consulting/instruction? Contact me.

Other

Feeds

GPULib

GPULib enables IDL developers to access the high-performance capabilities of modern NVIDIA graphics cards without knowledge of CUDA programming.

TaskDL

TaskDL is a task-farming solution for IDL designed for problems with loosely-coupled, parallel applications where no communication between nodes of a cluster is required.

mpiDL

mpiDL is a library of IDL bindings for Message Passing Interface (MPI) used for tightly-coupled parallel applications.

Remote Data Toolkit

The Remote Data Toolkit is a library of IDL routines allowing for easy access to various scientific data in formats such as OPeNDAP, HDF 5, and netCDF.
Modern IDL

Modern IDL offers IDL programmers one place to look, for beginners and advanced users alike. This book also contains: a thorough tutorial on the core topics of IDL; a comprehensive introduction to the object graphics system; common problems and gotchas with many examples; advanced topics not normally found are discussed throughout the book: regular expressions, object graphics, advanced widget programming, performance, object-oriented programming, etc.
IDLdoc

IDLdoc is an open source utility for generating documentation from IDL source code and specially formatted comments.

mgunit

mgunit is an open source unit testing framework for IDL.

rIDL

rIDL is an open source IDL command line replacement.

mglib

mglib is an open source library of IDL routines in areas of visualization, application development, command line utilities, analysis, data access, etc.

Category "Machine learning"

Technical debt in machine learning

A Visual Introduction to Machine Learning

michaelgalloy.com

Buy Modern IDL now!

About me

Other

Feeds

GPULib

TaskDL

mpiDL

Remote Data Toolkit

Modern IDL

IDLdoc

mgunit

rIDL

mglib