The Wikipedia article on in-place matrix transposition tantalizingly hints that FFTW can compute rectangular in-place matrix transposes efficiently (or as efficiently as one can). FFTW's manual, however, is silent. At least as far as section headings, function names, and the index are concerned.
Frigo and Johnson's paper, on page 6, spells out how to tap into the in-place transpose routines: create a rank-0 transform of vector rank 2 using transposed vector strides. I had to bug Matteo Frigo twice (nice guy!) to figure out how to accomplish this using the guru interface.
A C99 solution capable of handling the double
-valued either row major or column major case looks like:
#include <assert.h>
#include <ctype.h>
#include <fftw3.h>
fftw_plan plan_transpose(char storage_type,
int rows,
int cols,
double *in,
double *out)
{
const unsigned flags = FFTW_ESTIMATE; /* Do not destroy input */
fftw_iodim howmany_dims[2];
switch (toupper(storage_type)) {
case 'R':
howmany_dims[0].n = rows;
howmany_dims[0].is = cols;
howmany_dims[0].os = 1;
howmany_dims[1].n = cols;
howmany_dims[1].is = 1;
howmany_dims[1].os = rows;
break;
case 'C':
howmany_dims[0].n = rows;
howmany_dims[0].is = 1;
howmany_dims[0].os = cols;
howmany_dims[1].n = cols;
howmany_dims[1].is = rows;
howmany_dims[1].os = 1;
break;
default:
return NULL;
}
const int howmany_rank = sizeof(howmany_dims)/sizeof(howmany_dims[0]);
return fftw_plan_guru_r2r(/*rank*/0, /*dims*/NULL,
howmany_rank, howmany_dims,
in, out, /*kind*/NULL, flags);
}
You can get a better performing plan if you're willing to use
flags = FFTW_MEASURE
or its big brothers. I chose to use
FFTW_ESTIMATE
in the sample above because I didn't want the planning process to destroy the input array.
You use the solution like:
void somelogic()
{
char storage_type = /* One of 'R', 'r', 'C', 'c' */
int rows = /* Number of rows */
int cols = /* Number of columns */
double *in = /* ... */
double *out = /* ... */
/* Plan the transpose once; transpose is in-place iff in == out */
fftw_plan transpose = plan_transpose(storage_type, rows, cols, in, out);
assert(transpose);
/* Execute the plan potentially many times */
fftw_execute(transpose);
/* FFTW New-array Execute functions should be callable, too */
/* Beware of mixing in-place and out-of-place planning and usage */
double *another_in = /* ... */
double *another_out = /* ... */
fftw_execute_r2r(transpose, another_in, another_out);
/* Destroy the plan when completely done */
fftw_destroy_plan(transpose);
}
There are many restrictions around the
the New-array execute interface, so be careful if you go that route.
One very cool thing is that you don't need to know a priori that you're computing an in-place transpose. The planner works for out-of-place transposes as well. Another cool thing is that the plans are useful for rectangular transposes and square transposes alike with more efficient algorithms being used for the latter case.
I wrote myself a toy to test it out. Transposing a Row major, 3-by-2 matrix In-place gives:
[499 rhys@mentes misc]$ ./fftw_transpose I R 3 2
viewing extents [ 3 2 ] with strides [ 2 1 ] {
0 1
100 101
200 201
}
stored contiguously as {
0 1 100 101 200 201
}
Transposing in-place with
(rdft-transpose-cut-3x2
(rdft-rank0-ip-sq-tiledbuf/1-x2-x2)
(rdft-rank0-iter-ci/1-x2))
viewing extents [ 2 3 ] with strides [ 3 1 ] {
0 100 200
1 101 201
}
stored contiguously as {
0 100 200 1 101 201
}
Transposing a
Column major,
2-by-
3 matrix
Out-of-place gives:
[500 rhys@mentes misc]$ ./fftw_transpose O C 2 3
viewing extents [ 2 3 ] with strides [ 1 2 ] {
0 1 2
100 101 102
}
stored contiguously as {
0 100 1 101 2 102
}
Transposing out-of-place with
(rdft-rank0-tiledbuf/1-x3-x2)
viewing extents [ 3 2 ] with strides [ 1 3 ] {
0 100
1 101
2 102
}
stored contiguously as {
0 1 2 100 101 102
}
From burning lots of time playing with FFTW's transpose capabilities, it seems that the planner returns NULL
(i.e. bombs) whenever you attempt to transpose a non-contiguous matrix. That rules out specifying leading dimensions and/or arbitrary strides
as far as I can tell. Someone please correct me if I'm wrong. I'd love to be wrong so I can use FFTW to emulate Intel MKL's mkl_trans.h functionality for systems where a newer version of MKL is not available.