D3 in Depth:

D3 in Depth aims to bridge the gap between introductory tutorials/books and the official documentation.

I have found D3 extremely useful for creating dynamic plots on dashboard style websites for monitoring data pipelines. This looks an excellent resource for learning it.

via FlowingData

I plot a lot of data on daily cycles, where there is no data collected at night. Let’s mock up some sample data with the following simple code:

IDL> x = [findgen(10), findgen(10) + 25, findgen(10) + 50]
IDL> seed = 0L
IDL> y = randomu(seed, 30)
IDL> plot, x, y

Then I get a plot like this:

This plot doesn’t show the nightly breaks in data well. Connecting the last data point collected from a day to the first data point collected the next day emphasizes the trend between these points, which may not be appropriate.

I have been using a fairly simple routine to insert NaNs into the data to break the plot into disconnected sections. For example, modify the above data for plotting with:

IDL> new_y = mg_insert_nan(x, y, [10.0, 35.0], new_x=new_x)
IDL> plot, new_x, new_y

The new plot shows the gaps between the “days” in the data:

James Hague writes:

Though my fascination with Forth is long behind me, I still tend toward minimalist programming, but not in the same, extreme, way. I’ve adopted a more modern approach to minimalism:

Use the highest-level language that’s a viable option.

Lean on the built-in features that do the most work.

Write as little code as possible.

I think this is good advance, but I would add one more point about having as few third party dependencies as possible that tries to balance the last point to write as little code as possible.

Colorgorical is an alternative to ColorBrewer with a few different options for creating color tables. For example, you can add a couple specific colors that should be in the color table and let Colorgorical figure out the others which maximizes the perceptual difference between the colors. Colorgorical seems particularly well suited to generating qualitative color tables, e.g., to find sufficiently different colors for each line in a plot.

via FlowingData.

Motivated by the below chart of the age distribution of Olympic athletes, Junk Charts presents several techniques to visualize multiple distributions:

Age distribution of Olympic athletes

Candidates include the traditional boxplots used by statisticians as well variations and a stack of histograms. I think violin plots, suggested by a commenter, are a nice compromise showing the full distribution.

It is often useful to display a progress bar showing the state of a task. MG_Progress can easily be used to display a progress bar, percent completion, and estimated time to completion. As a simple example, let’s pretend to load 100 items (while actually just waiting a bit):

foreach i, mg_progress(indgen(100), title='Loading') do wait, 0.1

The above line produces the following output:

Code for mg_progress__define is on GitHub (you will need mg_statusline also). See the code docs for the many other options that can be used with MG_Progress like dealing with a list of items that don’t all take equal time and customizing the display.

John Nelson produced this beautiful map of how the boundaries of US droughts have changed over the last five years with data from the US Drought Monitor:

Link via FlowingData.

Part 2 (of what promises to be a four part series) of the great comparison of Google Maps and Apple Maps by Justin O’Beirne. See part 1 before starting with part 2.

This is an example of using a clever color key that doubles as a histogram showing the distribution of the corresponding areas.

By the way, this post is from a great series about small ways to make better visualizations.

Great post examining some of the reasons why the FFT algorithm is so fast compared to a naive implementation:

The goal of this post is to dive into the Cooley-Tukey FFT algorithm, explaining the symmetries that lead to it, and to show some straightforward Python implementations putting the theory into practice. My hope is that this exploration will give data scientists like myself a more complete picture of what’s going on in the background of the algorithms we use.

« newer postsolder posts »