Category "Machine learning"


I think “machine learning” in this paper applies fairly well to any type of scientific pipeline code:

Using the framework of technical debt, we note that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning.

The authors argue that machine learning systems have the regular issues of a code, but also have other complexities that are not necessary addressed in the normal way of refactoring libraries, adding unit tests, etc.

A Visual Introduction to Machine Learning provides a simple, visual explanation of using decision trees:

Recap:

  1. Machine learning identifies patterns using statistical learning and computers by unearthing boundaries in data sets. You can use it to make predictions.
  2. One method for making predictions is called a decision trees, which uses a series of if-then statements to identify boundaries and define patterns in the data
  3. Overfitting happens when some boundaries are based on on distinctions that don’t make a difference. You can see if a model overfits by having test data flow through the model.

Decision tree

The post, which is already ten months old, promises a second in the series on overfitting, but this short introduction is a fine standalone.