04 February 2009

Avoiding tedious numerics using Boost Accumulators

For a turbulence warm up assignment, we needed to compute some correlation coefficients for trajectory behavior on the Lorenz attractor. Specifically, compute the mean, variance, and cross-correlations of the three solution components. I'd skimmed through the Boost Accumulator Framework a couple of months back and thought it looked interesting:

#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics/covariance.hpp>
#include <boost/accumulators/statistics/mean.hpp>
#include <boost/accumulators/statistics/stats.hpp>
#include <boost/accumulators/statistics/variance.hpp>
#include <boost/accumulators/statistics/variates/covariate.hpp>
#include <boost/format.hpp>
#include <fstream>
#include <iostream>
#include <sstream>
#include <string>

// Read an input streams and compute the relevant statistics
template<typename charT, typename traits>
int process(std::basic_ostream<charT, traits>& out,
            std::basic_istream<charT, traits>& in) {

    using namespace boost::accumulators;
    accumulator_set<double, stats<tag::mean,
                                  tag::covariance<double, tag::covariate1> > > x_acc;
    accumulator_set<double, stats<tag::mean,
                                  tag::covariance<double, tag::covariate1> > > y_acc;
    accumulator_set<double, stats<tag::mean,
                                  tag::covariance<double, tag::covariate1> > > z_acc;

    double t, x, y, z;
    std::basic_string<charT, traits> line;
    boost::format fmt(" %016e   %016e   %016e   %016e   %016e   %016e   %016e   %016e   %016e   %016e");
    while (in) {
        if (!std::getline(in, line)) break; // Read columns per line to
        std::istringstream in(line);       //    be a little defensive
        in >> t >> x >> y >> z;

        // Compute running statistics
        x_acc(x, covariate1 = y);
        y_acc(y, covariate1 = z);
        z_acc(z, covariate1 = x);

        // Output running statistics
        out << fmt % t
                   % mean(x_acc)
                   % mean(y_acc)
                   % mean(z_acc)
                   % variance(x_acc)
                   % variance(y_acc)
                   % variance(z_acc)
                   % covariance(x_acc)
                   % covariance(z_acc)
                   % covariance(y_acc)
            << std::endl;

    return 0;

Compared with the alternative loops/bookkeeping/etc., this is a pretty slick alternative. The only downside was that the accumulator declarations took me awhile to decipher from the documentation.

As you probably noticed, I am reading/writing plain text data files and computing the statistics at every time step. This is overkill and slow. I'd probably have fixed that by now if it weren't for the fact that I ran across Pipe Viewer in the last two days. Quite a neat shell tool that.

1 comment:

Colin said...

Nice post. To make iteven shorter why use typedef:
typedef accumulator_set > > acc_t;

acc_t x_acc;
acc_t y_acc;
acc_t z_acc;

Using this method I implemented a vector of accumulators for multidimensional data. Very flexible tool!


Subscribe Subscribe to The Return of Agent Zlerich