Mergeable accumulation of the running min, mean, max, and variance
Boost Accumulators are pretty slick, but they do have the occasional shortcoming. To my knowledge, one cannot merge data from independent accumulator sets. Merging is a nice operation to have when you want to compute statistics in parallel and then roll up the results into a single summary.
In his article Accurately computing running variance, John D. Cook presents a small class for computing a running mean and variance using an algorithm with favorable numerical properties reported by Knuth in TAOCP. Departing from Cook's sample, my rewrite below includes
- templating on the floating point type,
- permitting accumulating multiple statistics simultaneously without requiring multiple counters,
- additionally tracking the minimum and maximum while holding storage overhead constant relative to Cook's version,
- providing sane (i.e. NaN) behavior when no data has been processed,
- permitting merging information from multiple instances,
- permitting clearing an instance for re-use,
- assertions to catch when one screws up,
- and adding Doxygen-based documentation.
No comments:
Post a Comment