When working with data files at my day job, I often come across directories containing a large number of files of several distinct types. It would be useful to produce a listing of the files clustered into these types. I wrote cls (Clustered ls) to find patterns in filenames for display.

For example, in the directory with 7552 files, the ls output is difficult to parse easily. But cls produces a much more compact listing that can be scanned:

$cls *.ls [4 files] 20180101_*_kcor_l1.5.fts.gz [1335 files] 20180101_*_kcor_l1.5.gif [1335 files] 20180101_*_kcor_l1.5_avg.fts.gz [167 files] 20180101_*_kcor_l1.5_avg.gif [167 files] 20180101_*_kcor_l1.5_avg_cropped.gif [167 files] 20180101_*_kcor_l1.5_cropped.gif [1335 files] 20180101_*_kcor_l1.5_nrgf.fts.gz [167 files] 20180101_*_kcor_l1.5_nrgf.gif [167 files] 20180101_*_kcor_l1.5_nrgf_cropped.gif [167 files] 20180101_180511_kcor_l1.5_extavg.fts.gz 20180101_180511_kcor_l1.5_extavg.gif 20180101_180511_kcor_l1.5_extavg_cropped.gif 20180101_180511_kcor_l1.5_nrgf_extavg.fts.gz 20180101_180511_kcor_l1.5_nrgf_extavg.gif 20180101_180511_kcor_l1.5_nrgf_extavg_cropped.gif 20180101_kcor_l1.5.{mp4,tarlist,tgz} 20180101_kcor_l1.5_nrgf_cropped.mp4 20180101_kcor_l1.5_{cropped,nrgf}.mp4 20180101_kcor_minus.mp4 20180102_*_kcor_l1.5.fts.gz [675 files] 20180102_*_kcor_l1.5.gif [675 files] 20180102_*_kcor_l1.5_avg.fts.gz [84 files] 20180102_*_kcor_l1.5_avg.gif [84 files] 20180102_*_kcor_l1.5_avg_cropped.gif [84 files] 20180102_*_kcor_l1.5_cropped.gif [675 files] 20180102_*_kcor_l1.5_nrgf.fts.gz [84 files] 20180102_*_kcor_l1.5_nrgf.gif [84 files] 20180102_*_kcor_l1.5_nrgf_cropped.gif [84 files]  The arguments for the script are quite simple right now: usage: cls [-h] [-m MAX_LISTED] [files [files ...]] Clustered ls 0.0.1 positional arguments: files path specification to check optional arguments: -h, --help show this help message and exit -m MAX_LISTED, --max-listed MAX_LISTED max number of files to explicitly list  The -m argument indicates the cutoff number of files to use “*” for, by default 3. For example, the line in the above output: *.ls [4 files]  is changed to: {okcgif,okfgif,okl1gz,oknrgf}.ls  if you use: $ cls -m 4


The current code for cls uses a naive method for determining the clusters which does not try to produce optimal clusters. This can change the output because there can be multiple ways to cluster a list of files. For example, in the directory with the following files:

$ls a-1.log a-2.log b-1.log b-2.log c-1.log c-2.log  cls can make two different clustering depending on how the original files are ordered: $ cls *
a-{1,2}.log
b-{1,2}.log
c-{1,2}.log
\$ cls *-{1,2}.log
{a,b,c}-1.log
{a,b,c}-2.log


See my scripts repo for code for cls. This code is a proof of concept prototype written in Python 3.