GNU parallel for parameter sweeps
Following up on previous failures, I've chosen to test a portion of my compressible channel code Suzerain across a small parameter sweep. This problem is ill-suited for a traditional HPC resource as many of the jobs will fail and none of them have to run for very long.
The problem is perfect for GNU parallel:
#!/bin/bash # Build the fixed parameters for channel initialization fixed="--Ma=1.5 --Re=3000 --Pr=0.7 --gamma=1.4 --beta=0.7 --k=8" # Build the parameter sweep affix for GNU parallel, placeholders, and a case description sweep="" ; case="case{#}" ; args="$case" sweep="$sweep ::: 96 120" ; Nx="--Nx={1}" ; args="$args $Nx" sweep="$sweep ::: 72 96" ; Ny="--Ny={2}" ; args="$args $Ny" sweep="$sweep ::: 2 3" ; htdelta="--htdelta={3}" ; args="$args $htdelta" sweep="$sweep ::: 60 96" ; Nz="--Nz={4}" ; args="$args $Nz" sweep="$sweep ::: 125 150" ; fluct_percent="--fluct_percent={5}" ; args="$args $fluct_percent" sweep="$sweep ::: 12345 67890" ; fluct_seed="--fluct_seed={6}" ; args="$args $fluct_seed" sweep="$sweep ::: 2 5" ; alpha="--alpha={7}" ; args="$args $alpha" sweep="$sweep ::: .4 .5" ; npower="--npower={8}" ; args="$args $npower" sweep="$sweep ::: 0:1 .5:.75" ; fluct_kxfrac="--fluct_kxfrac={9}" ; args="$args $fluct_kxfrac" sweep="$sweep ::: 0:1 .5:.75" ; fluct_kzfrac="--fluct_kzfrac={10}" ; args="$args $fluct_kzfrac" # Remove any stale data rm -rfv case* bad.case* # Generate a master table of test cases for fun and profit parallel -k -j 1 echo $args $sweep | column -t > master # Generate inputs for a single test case parallel "mkdir $case && cd $case && echo $args > args" \ " && ../bin/channel_init initial.h5 $fixed $Nx $Ny $htdelta $Nz $alpha $npower" \ " && ../bin/channel_explicit initial.h5 --advance_nt=0 $fluct_percent $fluct_seed $fluct_kxfrac $fluct_kzfrac" \ $sweep # Run each case for 1000 time steps to shake out non-robust scenarios parallel --eta -j 3 -u \ "cd {} && ../bin/channel_explicit restart0.h5 --advance_nt=1000 --status_nt=5" \ ::: case* # Flag test cases that glaringly failed with a ".bad" suffix parallel "mv {//} bad.{//}" ::: $(grep -l "TimeController halted" case*/log.dat)
The first two "blocks" define the parameter sweep using parallel's feature where "::: a b ::: 1 2" turns into the outer product "a 1", "a 2", "b 1", "b 2". The next bit builds a table of these outer products so I can refer to it later. The remaining three bits perform some IO-heavy initialization, some compute-intensive tasks, and then a post-processing pass on each of the entries in said outer product. This logic, and the associated runtime process juggling to get nice batch throughput, would be hell without GNU parallel. Thanks Ole.