## 28 January 2010

### Ahhh... Corporate Governance

Really?

We have agreed that if our stockholders fail to adopt the Amendment at the Special Meeting or any postponement or adjournment thereof, we will continue to seek to obtain the requisite approval at least as frequently as every six months until such stockholder adoption has been obtained.

Really. Thankfully, they caught and removed the draft paragraph's final "you stupid asshats".

Admittedly, I have not yet fully reviewed all the particulars. The choice paragraph appears at the bottom of page 6.

## 20 January 2010

### Using FFTW for in-place matrix transposition

The Wikipedia article on in-place matrix transposition tantalizingly hints that FFTW can compute rectangular in-place matrix transposes efficiently (or as efficiently as one can). FFTW's manual, however, is silent. At least as far as section headings, function names, and the index are concerned.

Frigo and Johnson's paper, on page 6, spells out how to tap into the in-place transpose routines: create a rank-0 transform of vector rank 2 using transposed vector strides. I had to bug Matteo Frigo twice (nice guy!) to figure out how to accomplish this using the guru interface.

A C99 solution capable of handling the double-valued either row major or column major case looks like:

#include <assert.h>
#include <ctype.h>
#include <fftw3.h>

fftw_plan plan_transpose(char storage_type,
int rows,
int cols,
double *in,
double *out)
{
const unsigned flags = FFTW_ESTIMATE; /* Do not destroy input */

fftw_iodim howmany_dims[2];
switch (toupper(storage_type)) {
case 'R':
howmany_dims[0].n  = rows;
howmany_dims[0].is = cols;
howmany_dims[0].os = 1;
howmany_dims[1].n  = cols;
howmany_dims[1].is = 1;
howmany_dims[1].os = rows;
break;
case 'C':
howmany_dims[0].n  = rows;
howmany_dims[0].is = 1;
howmany_dims[0].os = cols;
howmany_dims[1].n  = cols;
howmany_dims[1].is = rows;
howmany_dims[1].os = 1;
break;
default:
return NULL;
}
const int howmany_rank = sizeof(howmany_dims)/sizeof(howmany_dims[0]);

return fftw_plan_guru_r2r(/*rank*/0, /*dims*/NULL,
howmany_rank, howmany_dims,
in, out, /*kind*/NULL, flags);
}
You can get a better performing plan if you're willing to use flags = FFTW_MEASURE or its big brothers. I chose to use FFTW_ESTIMATE in the sample above because I didn't want the planning process to destroy the input array.

You use the solution like:

void somelogic()
{
char storage_type = /* One of 'R', 'r', 'C', 'c' */
int rows          = /* Number of rows */
int cols          = /* Number of columns */
double *in        = /* ... */
double *out       = /* ... */

/* Plan the transpose once; transpose is in-place iff in == out */
fftw_plan transpose = plan_transpose(storage_type, rows, cols, in, out);
assert(transpose);

/* Execute the plan potentially many times */
fftw_execute(transpose);

/* FFTW New-array Execute functions should be callable, too */
/* Beware of mixing in-place and out-of-place planning and usage */
double *another_in  = /* ... */
double *another_out = /* ... */
fftw_execute_r2r(transpose, another_in, another_out);

/* Destroy the plan when completely done */
fftw_destroy_plan(transpose);
}
There are many restrictions around the the New-array execute interface, so be careful if you go that route.

One very cool thing is that you don't need to know a priori that you're computing an in-place transpose. The planner works for out-of-place transposes as well. Another cool thing is that the plans are useful for rectangular transposes and square transposes alike with more efficient algorithms being used for the latter case.

I wrote myself a toy to test it out. Transposing a Row major, 3-by-2 matrix In-place gives:

[499 rhys@mentes misc]$./fftw_transpose I R 3 2 viewing extents [ 3 2 ] with strides [ 2 1 ] { 0 1 100 101 200 201 } stored contiguously as { 0 1 100 101 200 201 } Transposing in-place with (rdft-transpose-cut-3x2 (rdft-rank0-ip-sq-tiledbuf/1-x2-x2) (rdft-rank0-iter-ci/1-x2)) viewing extents [ 2 3 ] with strides [ 3 1 ] { 0 100 200 1 101 201 } stored contiguously as { 0 100 200 1 101 201 }  Transposing a Column major, 2-by-3 matrix Out-of-place gives: [500 rhys@mentes misc]$ ./fftw_transpose O C 2 3

viewing extents [ 2 3 ] with strides [ 1 2 ] {
0    1    2
100  101  102
}
stored contiguously as {
0  100    1  101    2  102
}

Transposing out-of-place with
(rdft-rank0-tiledbuf/1-x3-x2)

viewing extents [ 3 2 ] with strides [ 1 3 ] {
0  100
1  101
2  102
}
stored contiguously as {
0    1    2  100  101  102
}


From burning lots of time playing with FFTW's transpose capabilities, it seems that the planner returns NULL (i.e. bombs) whenever you attempt to transpose a non-contiguous matrix. That rules out specifying leading dimensions and/or arbitrary strides as far as I can tell. Someone please correct me if I'm wrong. I'd love to be wrong so I can use FFTW to emulate Intel MKL's mkl_trans.h functionality for systems where a newer version of MKL is not available.

## 13 January 2010

### Mathematica's FourierParameters necessary to emulate FFTW_FORWARD and FFTW_BACKWARD

These are the FourierParameters necessary to have Mathematica's Fourier and InverseFourier discrete transforms behave like FFTW's forward and backward FFTs:

FFTWForward[expr_]  := Fourier[expr, FourierParameters -> {1, -1}];
FFTWBackward[expr_] := InverseFourier[expr, FourierParameters -> {-1, -1}];

In particular, these functions/parameter choices do not scale the result when transforming in either direction.

Similar parameters can be used when performing analytic transforms using FourierTransform and InverseFourierTransform:

FFTWForwardAnalytic[expr_, t_, w_] := FourierTransform[expr, t, w, FourierParameters -> {1, -1}];
FFTWBackwardAnalytic[expr_, w_, t_] := InverseFourierTransform[expr, w, t, FourierParameters -> {-1, -1}];

### Providing fill and for_each algorithms for Boost.MultiArray

I sometimes want to fill a Boost.MultiArray with a particular value. For concrete implementations like boost::multi_array or boost::multi_array_ref, the task isn't too hard and std::fill works just fine. But getting an arbitrary-dimensional version working that only uses the MultiArray concept takes a little thought. This is important when working with views. I additionally want the solution to take advantage of the contiguous storage guarantees made by multi_array and multi_array_ref.

Because fill looks like for_each with an assignment-like UnaryFunction, I decided to write a MultiArray-specific for_each implementation. First comes a helper functor recursively templated on the dimensionality:

namespace { // anonymous

template<std::size_t D>
struct for_each_functor {

BOOST_STATIC_ASSERT(D != 0); // Nonsensical behavior for zero dimensions
BOOST_STATIC_ASSERT(D > 1);  // Ensure not instantiated for specialized values

// for details about why MultiArray::iterator::reference is used below.
template<class MultiArray,
class UnaryFunction >
void operator()(MultiArray &x, UnaryFunction &f) const {
for_each_functor<D-1> functor;
for (typename MultiArray::iterator i = x.begin(); i != x.end(); ++i) {
typename MultiArray::iterator::reference ri = *i;
functor(ri,f);
}
}
};

template<> struct for_each_functor<1> {
template<class MultiArray,
class UnaryFunction >
void operator()(MultiArray &x, UnaryFunction &f) const {
std::for_each(x.begin(), x.end(), f);
}
};

} // namespace anonymous

The functor uses std::for_each for the one dimensional base case. The BOOST_STATIC_ASSERTs are there just to keep me from shooting myself in the foot. A smarter version might specialize for the two- and three-dimensional cases, check the underlying strides, and then be certain to iterate in the faster directions first.

Now that we've got for_each_functor that works on the general MultiArray concept, it's straightforward to implement for_each:

/**
* Invoke \c f for each element in MultiArray \c x.  The
* order in which the invocations take place is undefined.
*
* @param x MultiArray over which to iterate.
* @param f UnaryFunction to invoke on each element of \c x.
*
* @return \c f.
*
* @see SGI's <a href="http://www.sgi.com/tech/stl/UnaryFunction.html">
*/
template<class MultiArray,
class UnaryFunction>
inline
UnaryFunction for_each(MultiArray &x,
UnaryFunction f) {
for_each_functor<MultiArray::dimensionality>()(x,f);
return f;
}

Please note that my for_each type signature is not identical with the STL algorithm's signature.

Since we know that multi_array and multi_array_ref have contiguous storage, we can specialize our version of for_each to linearly walk their underlying memory using the usual algorithms.

/**
* Invoke \c f for each element in <tt>boost::multi_array</tt> \c x.  The order
* in which the invocations take place is undefined.  This specialization
* takes advantage of <tt>boost::multi_array</tt>'s contiguous storage.
*
* @param x <tt>boost::multi_array</tt> over which to iterate.
* @param f UnaryFunction to invoke on each element of \c x.
*
* @return \c f.
*
* @see SGI's <a href="http://www.sgi.com/tech/stl/UnaryFunction.html">
*/
template<class ValueType, std::size_t NumDims, class Allocator,
class UnaryFunction>
inline
UnaryFunction for_each(boost::multi_array<ValueType,NumDims,Allocator> &x,
UnaryFunction f) {
return std::for_each(x.data(), x.data() + x.num_elements(), f);
}

/**
* Invoke \c f for each element in <tt>boost::multi_array_ref</tt> \c x.  The
* order in which the invocations take place is undefined.  This specialization
* takes advantage of <tt>boost::multi_array_ref</tt>'s contiguous storage.
*
* @param x <tt>boost::multi_array_ref</tt> over which to iterate.
* @param f UnaryFunction to invoke on each element of \c x.
*
* @return \c f.
*
* @see SGI's <a href="http://www.sgi.com/tech/stl/UnaryFunction.html">
*/
template<class ValueType, std::size_t NumDims,
class UnaryFunction>
inline
UnaryFunction for_each(boost::multi_array_ref<ValueType,NumDims> &x,
UnaryFunction f)
{
return std::for_each(x.data(), x.data() + x.num_elements(), f);
}

With for_each complete, we can move on to implementing fill by mapping an assignment-like functor over over a MultiArray.

In my particular application I have MultiArrays with complex-valued elements from a couple of different numeric libraries (e.g. std::complex<T>, T[2]) and I'd like to be able to fill a MultiArray with a value for which a member operator= may not be available. To accommodate this need, I created a templated assignment functor which can be specialized using schmancy boost::enable_if techniques based on both the target and source of the assignment:

/**
* A functor that performs assignment to type \c Target from type \c Source.
* \c Target must be assignable from \c Source.
*
* The \c Enable template parameter allows using <tt>boost::enable_if</tt> to
* specialize or partially specialize the functor per <tt>enable_if</tt>'s
* documentation section <a
* href="http://www.boost.org/doc/libs/1_41_0/libs/utility/enable_if.html">
* 3.1: Enabling template class specializations</a>.
*/
template<class Target, class Source, class Enable = void>
struct assign {

/**
* Create an instance which assigns \c s when applied.
*
* @param s source of assignment operation occurring via
*          <tt>operator()</tt>.
*/
assign(const Source &s) : s_(s) {}

/**
* Assign the value provided at construction to \c t.
*
* @param t to be assigned.
*/
void operator()(Target& t) const { t = s_; }

private:
const Source &s_; /**< Source for assignment operations */
};

The default implementation above makes the extra parameters seem overkill. I'll give an example where it is useful later.

Using the assignment functor and for_each, here's our one-line fill:

/**
* Fill MultiArray \c x with the value \c v.  MultiArray <tt>x</tt>'s elements
* must be assignable from \c v.  The underlying assignment uses
* ::assign to allow specializations of that functor to be found.
*
* @param x MultiArray to fill.
* @param v Value with which to fill \c x.
*/
template<class MultiArray, class V>
void fill(MultiArray &x, const V &v) {
for_each(x, assign<typename MultiArray::element,V>(v));
}

Finally, to show the spurious-looking assign functor template parameters are useful, here's an assign functor for the complex-valued use case I mentioned above:

/**
* A specialization of the assign functor to properly handle the case where \c
* Target is a recognized complex type according to
* ::suzerain::complex::traits::is_complex.  It uses
* ::suzerain::complex::assign_complex to perform the assignment, and therefore
* supports all types that \c assign_complex does.
*/
template<class Target, class Source>
struct assign<
Target,
Source,
typename boost::enable_if<
::suzerain::complex::traits::is_complex<Target>
>::type >
{
/**
* Create an instance which assigns \c s when applied.
*
* @param s source of assignment operation occurring via
*          <tt>operator()</tt>.
*/
assign(const Source &s) : s_(s) {};

/**
* Assign the value provided at construction to \c t.
*
* @param t to be assigned.
*/
void operator()(Target& t) const {
::suzerain::complex::assign_complex(t, s_);
}

private:
const Source &s_; /**< Source for assignment operations */
};

That very last code snippet is missing details and won't compile. Note that the suzerain::complex::assign_complex invocation may itself be a template and it receives appropriate type information. This turned out to be important when I wanted to fill a complex-valued MultiArray with a real-valued scalar. I needed two versions of assign_complex: one that took as a Source a recognized complex type and one that took a scalar type. boost::enable_if and boost::disable_if made it easy to provide an assign_complex template that did the right thing.

Given the right #includes in the right places, the other snippets should all compile. Please let me know if they do not.

This post topic came from a question I asked on boost-users a couple of days back. Hopefully someone else finds this post and can use this fill routine. Thank you to Ronald Garcia for his response and for maintaining Boost.MultiArray.