When working with data files at my day job, I often come across directories containing a large number of files of several distinct types. It would be useful to produce a listing of the files clustered into these types. I wrote `cls` (Clustered ls) to find patterns in filenames for display.

For example, in the directory with 7552 files, the `ls` output is difficult to parse easily. But `cls` produces a much more compact listing that can be scanned:

$ cls
*.ls [4 files]
20180101_*_kcor_l1.5.fts.gz [1335 files]
20180101_*_kcor_l1.5.gif [1335 files]
20180101_*_kcor_l1.5_avg.fts.gz [167 files]
20180101_*_kcor_l1.5_avg.gif [167 files]
20180101_*_kcor_l1.5_avg_cropped.gif [167 files]
20180101_*_kcor_l1.5_cropped.gif [1335 files]
20180101_*_kcor_l1.5_nrgf.fts.gz [167 files]
20180101_*_kcor_l1.5_nrgf.gif [167 files]
20180101_*_kcor_l1.5_nrgf_cropped.gif [167 files]
20180102_*_kcor_l1.5.fts.gz [675 files]
20180102_*_kcor_l1.5.gif [675 files]
20180102_*_kcor_l1.5_avg.fts.gz [84 files]
20180102_*_kcor_l1.5_avg.gif [84 files]
20180102_*_kcor_l1.5_avg_cropped.gif [84 files]
20180102_*_kcor_l1.5_cropped.gif [675 files]
20180102_*_kcor_l1.5_nrgf.fts.gz [84 files]
20180102_*_kcor_l1.5_nrgf.gif [84 files]
20180102_*_kcor_l1.5_nrgf_cropped.gif [84 files]

The arguments for the script are quite simple right now:

usage: cls [-h] [-m MAX_LISTED] [files [files …]]

Clustered ls 0.0.1

positional arguments:
files path specification to check

optional arguments:
-h, –help show this help message and exit
-m MAX_LISTED, –max-listed MAX_LISTED
max number of files to explicitly list

The `-m` argument indicates the cutoff number of files to use “*” for, by default 3. For example, the line in the above output:

*.ls [4 files]

is changed to:


if you use:

$ cls -m 4

The current code for `cls` uses a naive method for determining the clusters which does not try to produce optimal clusters. This can change the output because there can be multiple ways to cluster a list of files. For example, in the directory with the following files:

$ ls
a-1.log a-2.log b-1.log b-2.log c-1.log c-2.log

`cls` can make two different clustering depending on how the original files are ordered:

$ cls *
$ cls *-{1,2}.log

See my [scripts] repo for code for `cls`. This code is a proof of concept prototype written in Python 3.

[scripts]: https://github.com/mgalloy/scripts “mgalloy/scripts”