Boost Accumulators are pretty slick, but they do have the occasional shortcoming. To my knowledge, one cannot merge data from independent accumulator sets. Merging is a nice operation to have when you want to compute statistics in parallel and then roll up the results into a single summary.
In his article Accurately computing running variance, John D. Cook presents a small class for computing a running mean and variance using an algorithm with favorable numerical properties reported by Knuth in TAOCP. Departing from Cook's sample, my rewrite below includes
- templating on the floating point type,
- permitting accumulating multiple statistics simultaneously without requiring multiple counters,
- additionally tracking the minimum and maximum while holding storage overhead constant relative to Cook's version,
- providing sane (i.e. NaN) behavior when no data has been processed,
- permitting merging information from multiple instances,
- permitting clearing an instance for re-use,
- assertions to catch when one screws up,
- and adding Doxygen-based documentation.
No comments:
Post a Comment