The Wikipedia article on in-place matrix transposition tantalizingly hints that FFTW can compute rectangular in-place matrix transposes efficiently (or as efficiently as one can). FFTW's manual, however, is silent. At least as far as section headings, function names, and the index are concerned.

Frigo and Johnson's paper, on page 6, spells out how to tap into the in-place transpose routines: create a rank-0 transform of vector rank 2 using transposed vector strides. I had to bug Matteo Frigo twice (nice guy!) to figure out how to accomplish this using the guru interface.

A C99 solution capable of handling the `double`

-valued either row major or column major case looks like:

**#include ****<assert.h>**

**#include ****<ctype.h>**

**#include ****<fftw3.h>**

fftw_plan plan_transpose(**char** storage_type,

**int** rows,

**int** cols,

**double** *in,

**double** *out)

{

**const** **unsigned** flags = FFTW_ESTIMATE; **/***** Do not destroy input *****/**

fftw_iodim howmany_dims[**2**];

**switch** (toupper(storage_type)) {

case **'R'**:

howmany_dims[**0**].n = rows;

howmany_dims[**0**].is = cols;

howmany_dims[**0**].os = **1**;

howmany_dims[**1**].n = cols;

howmany_dims[**1**].is = **1**;

howmany_dims[**1**].os = rows;

**break**;

case **'C'**:

howmany_dims[**0**].n = rows;

howmany_dims[**0**].is = **1**;

howmany_dims[**0**].os = cols;

howmany_dims[**1**].n = cols;

howmany_dims[**1**].is = rows;

howmany_dims[**1**].os = **1**;

**break**;

default:

**return** **NULL**;

}

**const** **int** howmany_rank = sizeof(howmany_dims)/sizeof(howmany_dims[**0**]);

**return** fftw_plan_guru_r2r(**/*****rank*****/****0**, **/*****dims*****/****NULL**,

howmany_rank, howmany_dims,

in, out, **/*****kind*****/****NULL**, flags);

}

You can get a better performing plan if you're willing to use

`flags = FFTW_MEASURE`

or its big brothers. I chose to use

`FFTW_ESTIMATE`

in the sample above because I didn't want the planning process to destroy the input array.

You use the solution like:

**void** somelogic()

{

**char** storage_type = **/***** One of 'R', 'r', 'C', 'c' *****/**

**int** rows = **/***** Number of rows *****/**

**int** cols = **/***** Number of columns *****/**

**double** *in = **/***** ... *****/**

**double** *out = **/***** ... *****/**

**/***** Plan the transpose once; transpose is in-place iff in == out *****/**

fftw_plan transpose = plan_transpose(storage_type, rows, cols, in, out);

assert(transpose);

**/***** Execute the plan potentially many times *****/**

fftw_execute(transpose);

**/***** FFTW New-array Execute functions should be callable, too *****/**

**/***** Beware of mixing in-place and out-of-place planning and usage *****/**

**double** *another_in = **/***** ... *****/**

**double** *another_out = **/***** ... *****/**

fftw_execute_r2r(transpose, another_in, another_out);

**/***** Destroy the plan when completely done *****/**

fftw_destroy_plan(transpose);

}

There are many restrictions around the

the New-array execute interface, so be careful if you go that route.

One very cool thing is that you don't need to know a priori that you're computing an in-place transpose. The planner works for out-of-place transposes as well. Another cool thing is that the plans are useful for rectangular transposes and square transposes alike with more efficient algorithms being used for the latter case.

I wrote myself a toy to test it out. Transposing a **R**ow major, **3**-by-**2** matrix **I**n-place gives:

[499 rhys@mentes misc]$ ./fftw_transpose I R 3 2
viewing extents [ 3 2 ] with strides [ 2 1 ] {
0 1
100 101
200 201
}
stored contiguously as {
0 1 100 101 200 201
}
Transposing in-place with
(rdft-transpose-cut-3x2
(rdft-rank0-ip-sq-tiledbuf/1-x2-x2)
(rdft-rank0-iter-ci/1-x2))
viewing extents [ 2 3 ] with strides [ 3 1 ] {
0 100 200
1 101 201
}
stored contiguously as {
0 100 200 1 101 201
}

Transposing a

**C**olumn major,

**2**-by-

**3** matrix

**O**ut-of-place gives:

[500 rhys@mentes misc]$ ./fftw_transpose O C 2 3
viewing extents [ 2 3 ] with strides [ 1 2 ] {
0 1 2
100 101 102
}
stored contiguously as {
0 100 1 101 2 102
}
Transposing out-of-place with
(rdft-rank0-tiledbuf/1-x3-x2)
viewing extents [ 3 2 ] with strides [ 1 3 ] {
0 100
1 101
2 102
}
stored contiguously as {
0 1 2 100 101 102
}

From burning lots of time playing with FFTW's transpose capabilities, it seems that the planner returns `NULL`

(i.e. bombs) whenever you attempt to transpose a non-contiguous matrix. That rules out specifying leading dimensions and/or arbitrary strides
as far as I can tell. Someone please correct me if I'm wrong. I'd love to be wrong so I can use FFTW to emulate Intel MKL's mkl_trans.h functionality for systems where a newer version of MKL is not available.