08 September 2011

GNU parallel for parameter sweeps

Following up on previous failures, I've chosen to test a portion of my compressible channel code Suzerain across a small parameter sweep. This problem is ill-suited for a traditional HPC resource as many of the jobs will fail and none of them have to run for very long.

The problem is perfect for GNU parallel:

#!/bin/bash

# Build the fixed parameters for channel initialization
fixed="--Ma=1.5 --Re=3000 --Pr=0.7 --gamma=1.4 --beta=0.7 --k=8"

# Build the parameter sweep affix for GNU parallel, placeholders, and a case description
sweep=""                       ; case="case{#}"                      ; args="$case"
sweep="$sweep ::: 96 120"      ; Nx="--Nx={1}"                       ; args="$args $Nx"
sweep="$sweep ::: 72  96"      ; Ny="--Ny={2}"                       ; args="$args $Ny"
sweep="$sweep :::  2   3"      ; htdelta="--htdelta={3}"             ; args="$args $htdelta"
sweep="$sweep ::: 60  96"      ; Nz="--Nz={4}"                       ; args="$args $Nz"
sweep="$sweep ::: 125 150"     ; fluct_percent="--fluct_percent={5}" ; args="$args $fluct_percent"
sweep="$sweep ::: 12345 67890" ; fluct_seed="--fluct_seed={6}"       ; args="$args $fluct_seed"
sweep="$sweep :::  2   5"      ; alpha="--alpha={7}"                 ; args="$args $alpha"
sweep="$sweep ::: .4  .5"      ; npower="--npower={8}"               ; args="$args $npower"
sweep="$sweep ::: 0:1 .5:.75"  ; fluct_kxfrac="--fluct_kxfrac={9}"   ; args="$args $fluct_kxfrac"
sweep="$sweep ::: 0:1 .5:.75"  ; fluct_kzfrac="--fluct_kzfrac={10}"  ; args="$args $fluct_kzfrac"

# Remove any stale data
rm -rfv case* bad.case*

# Generate a master table of test cases for fun and profit
parallel -k -j 1 echo $args $sweep | column -t > master

# Generate inputs for a single test case
parallel "mkdir $case && cd $case && echo $args > args"                                                             \
     " && ../bin/channel_init initial.h5 $fixed $Nx $Ny $htdelta $Nz $alpha $npower"                                \
     " && ../bin/channel_explicit initial.h5 --advance_nt=0 $fluct_percent $fluct_seed $fluct_kxfrac $fluct_kzfrac" \
     $sweep

# Run each case for 1000 time steps to shake out non-robust scenarios
parallel --eta -j 3 -u                                                              \
     "cd {} && ../bin/channel_explicit restart0.h5 --advance_nt=1000 --status_nt=5" \
     ::: case*
 
# Flag test cases that glaringly failed with a ".bad" suffix
parallel "mv {//} bad.{//}" ::: $(grep -l "TimeController halted" case*/log.dat)

The first two "blocks" define the parameter sweep using parallel's feature where "::: a b ::: 1 2" turns into the outer product "a 1", "a 2", "b 1", "b 2". The next bit builds a table of these outer products so I can refer to it later. The remaining three bits perform some IO-heavy initialization, some compute-intensive tasks, and then a post-processing pass on each of the entries in said outer product. This logic, and the associated runtime process juggling to get nice batch throughput, would be hell without GNU parallel. Thanks Ole.

2 comments:

Ole Tange said...

Any reason why you do not leave out -j 1:

parallel -k echo $args $sweep | column -t > master

Any reason why you do not leave out the "'s:

parallel mv {//} bad.{//} :::

Unknown said...

I added "-j 1" because I use "-j +0" in my parallelrc. The extra quotes in the "parallel mv" command are just a force of habit when quoting filenames.

Subscribe Subscribe to The Return of Agent Zlerich