23 December 2010

getopt.h:195: error: redefinition of 'struct option'

While working with Gnulib recently I kept running into problems with its getopt-gnu module:

In file included from ../../esio/gnulib/argp.h:24:0,
                 from ../../esio/apps/esio_bench.c:7:
../gnulib/getopt.h:195:8: error: redefinition of ‘struct option’
/usr/include/getopt.h:106:8: note: originally defined here
../gnulib/getopt.h:242:12: error: conflicting types for ‘getopt_long’
/usr/include/getopt.h:175:12: note: previous declaration of ‘getopt_long’ was here
../gnulib/getopt.h:246:12: error: conflicting types for ‘getopt_long_only’
/usr/include/getopt.h:179:12: note: previous declaration of ‘getopt_long_only’ was here
make[2]: *** [esio_bench.o] Error 1
which looked suspiciously like someone else's problem but couldn't be resolved by a simple make distclean.

Turns out that I failed to include the required

#ifdef HAVE_CONFIG_H
#include "config.h"
#endif

at the top of my source file. Oops. This detail is obvious in hindsight but slightly difficult to dig out of the relevant section of the gnulib manual.

16 December 2010

snprintf + realloc = snprintf_realloc

Recently, in C99, I needed to format some arbitrary-length string content using snprintf. I could re-use the same buffer because I only needed one result at a time. I wanted to (hopefully) minimize the number of malloc calls necessary. Finally, I wanted to (mostly) maintain snprintf's interface including its error semantics.

My use case looked like this...

    char *buf = NULL;
    size_t len = 0;
    while (/* more work */) {
        if (0 > snprintf_realloc(&buf, &len, /*format specifier*/, ...)) {
            /* error occurred, report it, break */
        }
        /* use buf to perform work */
    }
    if (buf) free(buf);

Here is the snprintf_realloc method I created including the relevant Doxygen and FCTX-based unit tests. The source below should build and run provided that fct.h is also in the same directory:

#include <stdarg.h>
#include <stdlib.h>

#include "fct.h"

/**
 * Call snprintf(*str, *size, format, ...) and reallocate the buffer
 * pointed to by *str as appropriate to contain the entire result.  On
 * exit, *str and *size will contain a pointer to the
 * realloced buffer and its maximum usable size, respectively.
 *
 * The reallocation scheme attempts to reduce the reallocation calls when the
 * same str and size arguments are used repeatedly.  It is
 * valid to pass *str == NULL and *size == 0 and then have
 * the buffer allocated to perfectly fit the result.
 *
 * @param[in,out] str    Pointer to the buffer in which to write the result.
 * @param[in,out] size   Pointer to the initial buffer size.
 * @param[in]     format Format specifier to use in sprintf call.
 * @param[in]     ...    Variable number of arguments corresponding
 *                       to \c format.
 *
 * @return On success, the number of characters (not including the trailing
 *         '\0') written to *str.  On error, a negative value
 *         is returned, *str is freed and *size
 *         is set to zero.
 */
int snprintf_realloc(char **str, size_t *size, const char *format, ...);

int
snprintf_realloc(char **str, size_t *size, const char *format, ...)
{
    int retval, needed;
    va_list ap;
    va_start(ap, format);
    while (   0     <= (retval = vsnprintf(*str, *size, format, ap)) // Error?
           && *size <  (needed = retval + 1)) {                      // Space?
        va_end(ap);
        *size *= 2;                                                  // Space?
        if (*size < needed) *size = needed;                          // Space!
        char *p = realloc(*str, *size);                              // Alloc
        if (p) {
            *str = p;
        } else {
            free(*str);
            *str  = NULL;
            *size = 0;
            return -1;
        }
        va_start(ap, format);
    }
    va_end(ap);
    return retval;
}

FCT_BGN()
{
    FCT_FIXTURE_SUITE_BGN(snprintf_realloc)
    {
        const char s[] = "1234";
        char *ptr;
        size_t size;

        FCT_SETUP_BGN()
        {
            ptr  = NULL;
            size = 0;
        }
        FCT_SETUP_END();

        FCT_TEARDOWN_BGN()
        {
            if (ptr) free(ptr);
        }
        FCT_TEARDOWN_END();

        FCT_TEST_BGN(snprintf_realloc)
        {
            // Initial should cause malloc-like behavior
            fct_chk_eq_int(4, snprintf_realloc(&ptr, &size, "%s", s));
            fct_chk(ptr);
            fct_chk_eq_int(size, 5);
            fct_chk_eq_str(ptr, s);

            // Repeated size should not cause any new buffer allocation
            {
                char *last_ptr = ptr;
                fct_chk_eq_int(4, snprintf_realloc(&ptr, &size, "%s", s));
                fct_chk(last_ptr == ptr);
                fct_chk_eq_int(size, 5);
                fct_chk_eq_str(ptr, s);
            }

            // Request requiring more than twice the space should
            // realloc memory to fit exactly.
            fct_chk_eq_int(12, snprintf_realloc(&ptr, &size, "%s%s%s",
                                                s, s, s));
            fct_chk_eq_int(size, 13);
            fct_chk_eq_str(ptr, "123412341234");

            // Request requiring less than twice the space should
            // cause a doubling of the buffer size.
            fct_chk_eq_int(16, snprintf_realloc(&ptr, &size, "%s%s%s%s",
                                                s, s, s, s));
            fct_chk_eq_int(size, 26);
            fct_chk_eq_str(ptr, "1234123412341234");
        }
        FCT_TEST_END();
    }
    FCT_FIXTURE_SUITE_END();
}
FCT_END()

Valgrind, after suppressing some unrelated FCTX warnings, gives this test a clean bill of health. If anyone's interested, aside from passing NULL pointers in or relying upon undefined snprintf behavior, I'd love to hear of ways to break this particular snippet.

04 December 2010

APS DFD 2010 Talk Slides

As promised, sort of, here are the slides from my APS DFD 2010 talk:

A WENO-Based Code for Investigating RANS Model Closures for Multicomponent Hydrodynamic Instabilities

Go listen to The Mikie Show

An old acquaintance has been running an amazing radio show called The Mikie Show for some time now. If you get the chance, take a listen. Think Mister Rogers on the radio if Mister Rogers happened to be a sound designer/musician with a groanful sense of humor.

Recently my friend Billy, an arborist, was in episode 25. Billy's fun. When asked "Does hugging a tree really help it?" Billy didn't miss a beat and replied "It depends what you're going to do next."

18 November 2010

Imminent (but probably dizzy) bundle o' joy

My wife amazes me most days. Today especially.

The bump's our son. He should be arriving in the next three to seven weeks.

12 November 2010

GNU Parallel

I discovered GNU Parallel today.  Phenomenal.  Especially for simple post processing on multicore systems.  Watch the video-- it is quite good.  Here are three sample commands taken from a bash script where $1 is set to be a particular prefix...

parallel --eta -j +0 ./gnuplot.sh -g -F {} $1.r???? ::: `seq 1 8`
Runs a custom gnuplot-based shell script eight times against a file glob.  Each time provide a different argument in the  set {1,2,...,7,8}.  The results are stored as $1.f1.gif, $1.f2.gif, etc.
ls -1 $1.r???? | parallel --eta -j +0 'montage -geometry 1280x960 -tile 4x2 {}.f?.gif {}.miff'
Create a tiled montage of the eight output files mentioned above and store the result in $1.miff.
ls -1 $1.r???? | parallel --eta -j +0 'convert {}.miff {}.gif'
Convert the resulting MIFFs into GIFs.

I win.  On my system, 8 - 0.

15 October 2010

Fortran generic interface mixed with iso_c_binding

I'm going crazy trying to figure out some Fortran generic interface funkiness. The compiler's complaining that a function call doesn't match a generic interface despite the fact that only one module procedure statement is in the interface block and the same damned code compiles when I write the concrete procedure name instead of the generic name.

Just so I've got it later, this happens to work:

module testing

  use, intrinsic :: iso_c_binding, only: c_float, c_double
  private :: c_float, c_double

  interface test
    module procedure test1
    module procedure test2
  end interface

contains

  real(c_float) function test1 (x)
    real(c_float), intent(in) :: x(*)
    test1 = 2 * x(1)
  end function test1

  real(c_double) function test2 (x)
    real(c_double), intent(in) :: x(*)
    test2 = 2 * x(1)
  end function test2

end module testing

program main
  use testing
  implicit none
  real(4) :: s(1)
  real(8) :: d(1)
  s = 2.0
  d = 2d0
  write (*,*) test(s)
  write (*,*) test(d)
end program main
This snippet, however, is not the thing that's breaking...

17 September 2010

APS DFD 2010 abstract

From the APS DFD 2010 schedule:

Session AW: Instability: Richtmyer-Meshkov/Rayleigh-Taylor I


8:00 AM–10:10 AM, Sunday, November 21, 2010
Hyatt Regency Long Beach Room: Regency C

Chair: Arindam Banerjee, Missouri University of Science & Technology

Abstract: AW.00010 : Comparisons of a Reynolds-Averaged Navier--Stokes Model with Self-Similar Solutions for Large Atwood Number Rayleigh--Taylor Mixing

9:57 AM–10:10 AM
Preview Abstract

Authors:

  Rhys Ulerich
    (University of Texas at Austin)
 
  Oleg Schilling
    (Lawrence Livermore National Laboratory 

A new high-order, multicomponent, weighted essentially nonoscillatory (WENO) implementation of a three- and four- equation Reynolds-averaged Navier-Stokes (RANS) model incorporating both mechanical and scalar turbulence is used to simulate intermediate-to-large Atwood number Rayleigh-Taylor turbulent mixing. The predicted RANS mixing layer evolution is compared with the analytical self-similar solutions of the transport equations. The terms in the transport equation budgets are compared in detail to their self-similar profiles across the mixing layer. Additionally, the sensitivity of the RANS solutions to variations in the initial conditions and in the model coefficients is explored. The implications of these results for advanced modeling of Rayleigh-Taylor turbulent mixing are discussed.

14 September 2010

internal error: assertion failed: lower_expr: bad kind

I'm back in Texas.

In case it helps anyone else, sometimes I see errors like the following

 ../../sz/suzerain/diffwave.hpp(65): internal error: assertion failed: lower_expr: bad kind (shared/edgcpfe/lower_il.c, line 14342)
coming from Intel's icpc 10.1 20081024. This usually happens when I use reinterpret_cast while calling a function but screw up the const-correctness of the type to which I'm casting.

I've never been able to get a clean recreate to provide to Intel Support.

12 August 2010

Poster from this summer's internship

As part of my summer internship at Lawrence Livermore National Laboratory I'll be presenting the following poster tomorrow:

Towards a WENO-based code for investigating RANS model closures for hydrodynamic instabilities

09 August 2010

di·ver·tisse·ment


View Mines Rd to Big Basin in a larger map

30 July 2010

Friday Musings

First, Mumford & Son's "Little Lion Man" is amazing.  Listen.  Purchase.  Bonus points for accidentally starting two or more copies and noticing it sounds excellent in a round.  Found it on thesixtyone.com, which is a great source for day-to-day working music and occasionally emits gems like this one.

Second, the Richtmyer-Meshkov setup I've been mucking with for two days is working. In it, a shock wave interacts with a perturbed, constant pressure interface between two different densities (details). Courtesy of gnuplot and gifsicle, here's an animated gif showing density evolution for a problem that's periodic in y and reflective at both x boundaries (click to view animation, 3.2M):



Just a roll up?
The reflective condition at the left boundary causes a re-reshock, which isn't usually what folks are interested in for these problems.  I'm not entirely sure how to enforce a strict inflow condition for compressible simulations like these-- presumably such an inflow condition would eat the outgoing characteristics and stop the re-reshock seen here.

23 July 2010

Glam Shot

Today at LLNL I was involved in some photos for a story on the summer students at ISCR:

Daniel from Ghana, Hilary from Dallas, and me
The screen images are from the 2D WENO-based code I've been revising to investigate RANS modeling of Rayleigh-Taylor instabilities with Dr. Oleg Schilling. As a bit of nerd sniping, I made sure Trac, VIM's NERD tree, and some Mathematica reproducing classic results from Roe's JCP 1981 paper were clearly visible.

Unrelated, today I also had a couple of beers with Ondřej Čertík of sympy and theoretical physics fame. Unbelievably swell guy— easily one of the smartest one or two people with whom I've ever shared a brew.

Update: Entertainingly, in the related story the lab released, the caption shows me as a "Livermore scientist".

20 July 2010

Burst traffic smoothing for SIP processing elements

I received another patent grant from my IBM days. This time its United States Patent 7,742,417:

Mechanisms for burst traffic smoothing for Session Initiation Protocol (SIP) processing elements are provided. A dispatch queue management engine determines whether a received packet is a TCP or UDP packet. If the packet is a TCP packet, the packet is automatically added to the dispatch queue. If the packet is a UDP packet, a value for a drop function f is generated and a random or pseudo-random number r is generated. If r has a predetermined relationship to f, then the UDP packet is added to the dispatch queue, otherwise the UDP packet is discarded. The value for f is based on the current dispatch queue load, the network quality, the retransmission rate, and the allowable drop rate. Thus, the determination as to whether to drop UDP packets or not is configurable by an administrator and also adaptable to the current network and dispatch queue conditions.

The idea for this patent came from dealing with overflowing UDP buffers during a SIP-enabled Java application server stress test. We observed that garbage collection events would pause the application server and cause a backlog of SIP retransmissions over UDP to grow. The invention is a way to selectively toss out UDP packets during such a retransmission burst in a way that would allow the application server to catch up again while maintaining a probabilistically high QoS.

15 July 2010

Wireless tunes on the bike


I'm quite stoked about combining three new toys:
  1. BlueAnt's F4 Interphone Bluetooth helmet communication system
  2. Sony's TMR-BT8IP Bluetooth iPod transmitter
  3. HJC's IS-16 helmet
I've tried out the first two (to Kids of the K Hole no less, an excellent escapist riding anthem) and am supremely impressed.  It'll be nice to not be tethered to the tankbag by a huge, flaky PS/2 cable a la my klunkotronic Chatterbox GMRS X1.

I can't speak to mounting the Interphone's helmet speakers since the new HJC hasn't arrived yet, but the mounting hardware BlueAnt provided looks solid.  Also no verdict yet on the intercom functionality, but the full duplex sound should be a welcome improvement compared to the GMRS X1 (unless Pauly abuses said duplex connection by singing along to decidedly non-escapist anthems).

25 June 2010

How to parse Fortran command line arguments using NAMELIST

While searching around for a pure Fortran 90+ way to parse command line arguments and/or input files, I stumbled across Exploring NAMELIST by John S. Urban. John's article shows a way to use Fortran's NAMELIST feature to quickly parse arbitrary command line arguments while providing sane default behavior.

Like usual, I decided to overdevelop a simple idea. The complete sample is up on github. Assuming you've got an interface declaration in scope for get_command_arguments, here's what you'd see as an end consumer:

PROGRAM namelist_cli

  IMPLICIT NONE

  CHARACTER(LEN=255) :: string, message
  INTEGER :: status

  ! Declare and initialize a NAMELIST for values of interest
  INTEGER  :: i=1,   j=2
  REAL     :: s=3.3, t=4.4
  NAMELIST /cmd/ i, j, s, t

  ! Get command line arguments as a string ready for NAMELIST
  CALL get_command_arguments(string, "&cmd", "/")
  READ (string, NML=cmd, IOSTAT=status, IOMSG=message)
  IF (status /= 0) THEN
    WRITE (*,*) 'Error:   ', status
    WRITE (*,*) 'Message: ', message
    CALL EXIT (status)
  END IF

  ! Echo the results
  WRITE (*, NML=cmd)

END PROGRAM namelist_cli

With a little mucking around, the same NAMELIST could be used to read an input file from disk, allow the user to override individual options on the command line (say, for parametric studies), and also to dump the full setup into an output file for tracking purposes.

31 May 2010

Summer internship

Following a 2,500 mile motorcycle trip from Austin to the Bay Area with my friend Paul, I'll be starting a summer internship at Lawrence Livermore National Labs tomorrow morning.

Instead of my usual my thesis work, I'll be looking at RANS turbulence models/numerics for Rayleigh-Taylor problems. As preparation, over the past couple of months, I've been spending some time staring at conservation law numerics classics like LeVeque's book as well as lecture notes from Caltech's AE 232 course. On the odd chance it is useful to anyone, I've put my toy code up at http://github.com/RhysU/codefour.

The Fortran 90 code solves inviscid or viscid 1D scalar conservation laws of the form u_t + f(u)_x = 0 or u_t + f(u)_x = \nu u_{xx} using WENO reconstruction and a Lax-Friedrichs flux.

It is based heavily on one provided through Caltech's AE 232 course. Compared to the course's code, the following changes have been made:

  1. Adding 3rd-order WENO reconstruction
  2. Adding 2nd and 4th order viscous capabilities
  3. Adding HDF5-based result output
  4. Adding Doxygen-based documentation
  5. Adding Mathematica-based error analysis against Hopf-Cole-based analytic solutions of the viscid Burger's equation

06 May 2010

Made GSL Committer

24 April 2010

New Toy

Courtesy of the employer, my new Lenovo W510 came in on Thursday.

make -j8 on a laptop is fun:

04 March 2010

Gram's Gob Recipe

My grandmother recently mailed me her gob recipe. Hands down, my Gram's are better than Yost's, your neighbor's, your mother's, or even your grandmother's. As a public service, I'm sharing this jewel.

Gram's Gobs

Cookies

Cream together
  • 2 cups sugar,
  • ½ cup crisco,
  • 2 eggs, and
  • 1 cup sour cream.
Then add 1 cup boiling water. Sift and add to the above
  • 4 cups flour,
  • 2 teaspoons baking soda,
  • ½ teaspoon baking powder,
  • ½ teaspoon salt, and
  • ½ cup cocoa.
Drop by tablespoons onto an ungreased baking sheet. Bake at 450° Fahrenheit for exactly 5 minutes. Remove from the sheet at once.

Filling

First, mix together
  • 5 tablespoons flour and
  • 1 cup milk.
Cook the mixture until very thick and then let it cool. Separately, use a mixer to mix until light and fluffy
  • 1 cup crisco or oleo,
  • 1 cup powdered sugar,
  • ¼ teaspoon salt, and
  • 1 teaspoon vanilla.
Add the first mixture to the second mixture and mix well.

Assembly

Spread the filling between two cookies wrapping each one separately. Gram adds...

17 February 2010

MPI Bandwidth Benchmarks

I need to see if the wall times reported by mpiP for the MPI calls underneath FFTW 3.3alpha1's parallel transposes make sense. The TACC folks were excellent, as usual, and pointed me to the MVAPICH benchmarks.

One quick realization of the bandwidth test on lonestar and ranger shows the following:

Unfortunately, this information does not explain my bottleneck...

28 January 2010

Ahhh... Corporate Governance

Really?

We have agreed that if our stockholders fail to adopt the Amendment at the Special Meeting or any postponement or adjournment thereof, we will continue to seek to obtain the requisite approval at least as frequently as every six months until such stockholder adoption has been obtained.

Really. Thankfully, they caught and removed the draft paragraph's final "you stupid asshats".

Admittedly, I have not yet fully reviewed all the particulars. The choice paragraph appears at the bottom of page 6.

20 January 2010

Using FFTW for in-place matrix transposition

The Wikipedia article on in-place matrix transposition tantalizingly hints that FFTW can compute rectangular in-place matrix transposes efficiently (or as efficiently as one can). FFTW's manual, however, is silent. At least as far as section headings, function names, and the index are concerned.

Frigo and Johnson's paper, on page 6, spells out how to tap into the in-place transpose routines: create a rank-0 transform of vector rank 2 using transposed vector strides. I had to bug Matteo Frigo twice (nice guy!) to figure out how to accomplish this using the guru interface.

A C99 solution capable of handling the double-valued either row major or column major case looks like:

#include <assert.h>
#include <ctype.h>
#include <fftw3.h>

fftw_plan plan_transpose(char storage_type,
                         int rows,
                         int cols,
                         double *in,
                         double *out)
{
    const unsigned flags = FFTW_ESTIMATE; /* Do not destroy input */

    fftw_iodim howmany_dims[2];
    switch (toupper(storage_type)) {
        case 'R':
            howmany_dims[0].n  = rows;
            howmany_dims[0].is = cols;
            howmany_dims[0].os = 1;
            howmany_dims[1].n  = cols;
            howmany_dims[1].is = 1;
            howmany_dims[1].os = rows;
            break;
        case 'C':
            howmany_dims[0].n  = rows;
            howmany_dims[0].is = 1;
            howmany_dims[0].os = cols;
            howmany_dims[1].n  = cols;
            howmany_dims[1].is = rows;
            howmany_dims[1].os = 1;
            break;
        default:
            return NULL;
    }
    const int howmany_rank = sizeof(howmany_dims)/sizeof(howmany_dims[0]);

    return fftw_plan_guru_r2r(/*rank*/0, /*dims*/NULL,
                              howmany_rank, howmany_dims,
                              in, out, /*kind*/NULL, flags);
}
You can get a better performing plan if you're willing to use flags = FFTW_MEASURE or its big brothers. I chose to use FFTW_ESTIMATE in the sample above because I didn't want the planning process to destroy the input array.

You use the solution like:

void somelogic()
{
    char storage_type = /* One of 'R', 'r', 'C', 'c' */
    int rows          = /* Number of rows */
    int cols          = /* Number of columns */
    double *in        = /* ... */
    double *out       = /* ... */

    /* Plan the transpose once; transpose is in-place iff in == out */
    fftw_plan transpose = plan_transpose(storage_type, rows, cols, in, out);
    assert(transpose);

    /* Execute the plan potentially many times */
    fftw_execute(transpose);

    /* FFTW New-array Execute functions should be callable, too */
    /* Beware of mixing in-place and out-of-place planning and usage */
    double *another_in  = /* ... */
    double *another_out = /* ... */
    fftw_execute_r2r(transpose, another_in, another_out);

    /* Destroy the plan when completely done */
    fftw_destroy_plan(transpose);
}
There are many restrictions around the the New-array execute interface, so be careful if you go that route.

One very cool thing is that you don't need to know a priori that you're computing an in-place transpose. The planner works for out-of-place transposes as well. Another cool thing is that the plans are useful for rectangular transposes and square transposes alike with more efficient algorithms being used for the latter case.

I wrote myself a toy to test it out. Transposing a Row major, 3-by-2 matrix In-place gives:

[499 rhys@mentes misc]$ ./fftw_transpose I R 3 2

viewing extents [ 3 2 ] with strides [ 2 1 ] {
   0    1 
 100  101 
 200  201 
}
stored contiguously as {
   0    1  100  101  200  201 
}

Transposing in-place with 
(rdft-transpose-cut-3x2
  (rdft-rank0-ip-sq-tiledbuf/1-x2-x2)
  (rdft-rank0-iter-ci/1-x2))

viewing extents [ 2 3 ] with strides [ 3 1 ] {
   0  100  200 
   1  101  201 
}
stored contiguously as {
   0  100  200    1  101  201 
}
Transposing a Column major, 2-by-3 matrix Out-of-place gives:
[500 rhys@mentes misc]$ ./fftw_transpose O C 2 3

viewing extents [ 2 3 ] with strides [ 1 2 ] {
   0    1    2 
 100  101  102 
}
stored contiguously as {
   0  100    1  101    2  102 
}

Transposing out-of-place with 
(rdft-rank0-tiledbuf/1-x3-x2)

viewing extents [ 3 2 ] with strides [ 1 3 ] {
   0  100 
   1  101 
   2  102 
}
stored contiguously as {
   0    1    2  100  101  102 
}

From burning lots of time playing with FFTW's transpose capabilities, it seems that the planner returns NULL (i.e. bombs) whenever you attempt to transpose a non-contiguous matrix. That rules out specifying leading dimensions and/or arbitrary strides as far as I can tell. Someone please correct me if I'm wrong. I'd love to be wrong so I can use FFTW to emulate Intel MKL's mkl_trans.h functionality for systems where a newer version of MKL is not available.

13 January 2010

Mathematica's FourierParameters necessary to emulate FFTW_FORWARD and FFTW_BACKWARD

These are the FourierParameters necessary to have Mathematica's Fourier and InverseFourier discrete transforms behave like FFTW's forward and backward FFTs:

FFTWForward[expr_]  := Fourier[expr, FourierParameters -> {1, -1}];
FFTWBackward[expr_] := InverseFourier[expr, FourierParameters -> {-1, -1}];


In particular, these functions/parameter choices do not scale the result when transforming in either direction.

Similar parameters can be used when performing analytic transforms using FourierTransform and InverseFourierTransform:

FFTWForwardAnalytic[expr_, t_, w_] := FourierTransform[expr, t, w, FourierParameters -> {1, -1}];
FFTWBackwardAnalytic[expr_, w_, t_] := InverseFourierTransform[expr, w, t, FourierParameters -> {-1, -1}];

Providing fill and for_each algorithms for Boost.MultiArray

I sometimes want to fill a Boost.MultiArray with a particular value. For concrete implementations like boost::multi_array or boost::multi_array_ref, the task isn't too hard and std::fill works just fine. But getting an arbitrary-dimensional version working that only uses the MultiArray concept takes a little thought. This is important when working with views. I additionally want the solution to take advantage of the contiguous storage guarantees made by multi_array and multi_array_ref.

Because fill looks like for_each with an assignment-like UnaryFunction, I decided to write a MultiArray-specific for_each implementation. First comes a helper functor recursively templated on the dimensionality:

namespace { // anonymous

template<std::size_t D>
struct for_each_functor {

    BOOST_STATIC_ASSERT(D != 0); // Nonsensical behavior for zero dimensions
    BOOST_STATIC_ASSERT(D > 1);  // Ensure not instantiated for specialized values

    // See http://groups.google.com/group/boost-list/browse_thread/thread/e16f32c4411dea08
    // for details about why MultiArray::iterator::reference is used below.
    template<class MultiArray,
             class UnaryFunction >
    void operator()(MultiArray &x, UnaryFunction &f) const {
        for_each_functor<D-1> functor;
        for (typename MultiArray::iterator i = x.begin(); i != x.end(); ++i) {
            typename MultiArray::iterator::reference ri = *i;
            functor(ri,f);
        }
    }
};

template<> struct for_each_functor<1> {
    template<class MultiArray,
             class UnaryFunction >
    void operator()(MultiArray &x, UnaryFunction &f) const {
        std::for_each(x.begin(), x.end(), f);
    }
};

} // namespace anonymous

The functor uses std::for_each for the one dimensional base case. The BOOST_STATIC_ASSERTs are there just to keep me from shooting myself in the foot. A smarter version might specialize for the two- and three-dimensional cases, check the underlying strides, and then be certain to iterate in the faster directions first.

Now that we've got for_each_functor that works on the general MultiArray concept, it's straightforward to implement for_each:

/**
 * Invoke \c f for each element in MultiArray \c x.  The
 * order in which the invocations take place is undefined.
 *
 * @param x MultiArray over which to iterate.
 * @param f UnaryFunction to invoke on each element of \c x.
 *
 * @return \c f.
 *
 * @see SGI's <a href="http://www.sgi.com/tech/stl/UnaryFunction.html">
 *      UnaryFunction</a> concept for more information.
 */
template<class MultiArray,
         class UnaryFunction>
inline
UnaryFunction for_each(MultiArray &x,
                       UnaryFunction f) {
    for_each_functor<MultiArray::dimensionality>()(x,f);
    return f;
}

Please note that my for_each type signature is not identical with the STL algorithm's signature.

Since we know that multi_array and multi_array_ref have contiguous storage, we can specialize our version of for_each to linearly walk their underlying memory using the usual algorithms.

/**
 * Invoke \c f for each element in <tt>boost::multi_array</tt> \c x.  The order
 * in which the invocations take place is undefined.  This specialization
 * takes advantage of <tt>boost::multi_array</tt>'s contiguous storage.
 *
 * @param x <tt>boost::multi_array</tt> over which to iterate.
 * @param f UnaryFunction to invoke on each element of \c x.
 *
 * @return \c f.
 *
 * @see SGI's <a href="http://www.sgi.com/tech/stl/UnaryFunction.html">
 *      UnaryFunction</a> concept for more information.
 */
template<class ValueType, std::size_t NumDims, class Allocator,
         class UnaryFunction>
inline
UnaryFunction for_each(boost::multi_array<ValueType,NumDims,Allocator> &x,
                       UnaryFunction f) {
    return std::for_each(x.data(), x.data() + x.num_elements(), f);
}

/**
 * Invoke \c f for each element in <tt>boost::multi_array_ref</tt> \c x.  The
 * order in which the invocations take place is undefined.  This specialization
 * takes advantage of <tt>boost::multi_array_ref</tt>'s contiguous storage.
 *
 * @param x <tt>boost::multi_array_ref</tt> over which to iterate.
 * @param f UnaryFunction to invoke on each element of \c x.
 *
 * @return \c f.
 *
 * @see SGI's <a href="http://www.sgi.com/tech/stl/UnaryFunction.html">
 *      UnaryFunction</a> concept for more information.
 */
template<class ValueType, std::size_t NumDims,
         class UnaryFunction>
inline
UnaryFunction for_each(boost::multi_array_ref<ValueType,NumDims> &x,
                       UnaryFunction f)
{
    return std::for_each(x.data(), x.data() + x.num_elements(), f);
}

With for_each complete, we can move on to implementing fill by mapping an assignment-like functor over over a MultiArray.

In my particular application I have MultiArrays with complex-valued elements from a couple of different numeric libraries (e.g. std::complex<T>, T[2]) and I'd like to be able to fill a MultiArray with a value for which a member operator= may not be available. To accommodate this need, I created a templated assignment functor which can be specialized using schmancy boost::enable_if techniques based on both the target and source of the assignment:

/**
 * A functor that performs assignment to type \c Target from type \c Source.
 * \c Target must be assignable from \c Source.
 *
 * The \c Enable template parameter allows using <tt>boost::enable_if</tt> to
 * specialize or partially specialize the functor per <tt>enable_if</tt>'s
 * documentation section <a
 * href="http://www.boost.org/doc/libs/1_41_0/libs/utility/enable_if.html">
 * 3.1: Enabling template class specializations</a>.
 */
template<class Target, class Source, class Enable = void>
struct assign {

    /**
     * Create an instance which assigns \c s when applied.
     *
     * @param s source of assignment operation occurring via
     *          <tt>operator()</tt>.
     */
    assign(const Source &s) : s_(s) {}

    /**
     * Assign the value provided at construction to \c t.
     *
     * @param t to be assigned.
     */
    void operator()(Target& t) const { t = s_; }

private:
    const Source &s_; /**< Source for assignment operations */
};

The default implementation above makes the extra parameters seem overkill. I'll give an example where it is useful later.

Using the assignment functor and for_each, here's our one-line fill:

/**
 * Fill MultiArray \c x with the value \c v.  MultiArray <tt>x</tt>'s elements
 * must be assignable from \c v.  The underlying assignment uses
 * ::assign to allow specializations of that functor to be found.
 *
 * @param x MultiArray to fill.
 * @param v Value with which to fill \c x.
 */
template<class MultiArray, class V>
void fill(MultiArray &x, const V &v) {
    for_each(x, assign<typename MultiArray::element,V>(v));
}

Finally, to show the spurious-looking assign functor template parameters are useful, here's an assign functor for the complex-valued use case I mentioned above:

/**
 * A specialization of the assign functor to properly handle the case where \c
 * Target is a recognized complex type according to
 * ::suzerain::complex::traits::is_complex.  It uses
 * ::suzerain::complex::assign_complex to perform the assignment, and therefore
 * supports all types that \c assign_complex does.
 */
template<class Target, class Source>
struct assign<
    Target,
    Source,
    typename boost::enable_if<
        ::suzerain::complex::traits::is_complex<Target>
    >::type >
{
    /**
     * Create an instance which assigns \c s when applied.
     *
     * @param s source of assignment operation occurring via
     *          <tt>operator()</tt>.
     */
    assign(const Source &s) : s_(s) {};

    /**
     * Assign the value provided at construction to \c t.
     *
     * @param t to be assigned.
     */
    void operator()(Target& t) const {
        ::suzerain::complex::assign_complex(t, s_);
    }

private:
    const Source &s_; /**< Source for assignment operations */
};

That very last code snippet is missing details and won't compile. Note that the suzerain::complex::assign_complex invocation may itself be a template and it receives appropriate type information. This turned out to be important when I wanted to fill a complex-valued MultiArray with a real-valued scalar. I needed two versions of assign_complex: one that took as a Source a recognized complex type and one that took a scalar type. boost::enable_if and boost::disable_if made it easy to provide an assign_complex template that did the right thing.

Given the right #includes in the right places, the other snippets should all compile. Please let me know if they do not.

This post topic came from a question I asked on boost-users a couple of days back. Hopefully someone else finds this post and can use this fill routine. Thank you to Ronald Garcia for his response and for maintaining Boost.MultiArray.

Subscribe Subscribe to The Return of Agent Zlerich