26 June 2011

Using Boost Spirit 2.1+ to evaluate constant arithmetic expressions

I used Boost Program_options for my research code's input parsing requirements. As expected, it works well. For floating point quantities, I've found having to specify 4.18879020478639 for 4π/3 is a bit cumbersome. When specifying arguments on the command line, I could use any one of a number of hacks to perform floating point arithmetic in bash. But that didn't help me when taking input from response files. I wanted an arithmetic expression utility that could process a C-like grammar. Long double accuracy was a requirement. I looked around a bit. GNU libmatheval and muParser appeared to be good fits but looked to be double precision-only. Though straightforward, I decided against combining the Shunting-yard algorithm with some in-place RPN logic because I wasn't enthusiastic about either finding or writing an appropriate (and admittedly toy simple) lexer.

Already having a Boost pre-requisite in my code, I decided to try out Boost Spirit since it could provide a header-only, precision-agnostic solution. Spirit has gone through a lot of significant revisions and many articles seemed to be written for the older Spirit Classic. The latest official reference manual gave several excellent examples that almost fit. I adapted Spirit's calc3.cpp and roman.cpp sample grammars to accomplish what I wanted. I added ** as an operator for exponentiation and pi as a built-in constant. I also added a smattering of special functions (e.g. exp). The task required about 150 lines of Spirit-based code. Further extensions should be fairly straightforward.
My sample code does the following:

  1. Declares X-macros for the unary and binary functions built into the grammar.
  2. Declares Boost Phoenix-ready functors for the unary and binary functions.
  3. Declares expression::grammar consisting of the rules expression, term, factor, and primary.
  4. Declares a helper called parse to hide needing boost::spirit::ascii::space when performing whitespace-agnostic parsing.
  5. Declares a small test macro called EXPRTEST to reduce test case boilerplate.
  6. Runs a collection of tests to ensure the parser is behaving as expected.
Compared to the samples on which I built, the biggest changes to the grammar were adding primary and reworking factor. I could have shortened up some details with a few using boost::spirit::qi; declarations, but I wanted to keep track of exactly which parts of Spirit I used. Feedback and suggestions are most welcome.

Updated 9 Jan 2012: During a second pass motivated by reducing compilation times, I removed the X-macro nastiness and replaced it with some qi::symbols-based logic. The code in the anonymous namespace could likely be removed if I could figure out the right boost::phoenix::bind hoodoo to use to replace lazy_ufunc_ and lazy_bfunc_ (any suggestions?). I have not yet measured the difference in either memory footprint or runtime performance. Compilation time and memory is greatly reduced for both GNU and Intel compilers.

Updated 31 Mar 2013: I have moved the code into a gist to facilitate both maintenance and gathering feedback. I have added the constant e and slightly reordered the primary grammar per the comments.

Updated 17-19 Sept 2013: I have added an MPL2 header to the gist. In the comments is a suggestion for how to use standalone MPL2 logic building atop this grammar from my project Suzerain to incorporate such parsing into your own, possibly non-MPL2, source tree.

18 comments:

Anonymous said...

Hello,

I'm new at this.
What would be a main function for this ?

Rhys Ulerich said...

A main function would look like the contents of the EXPRTEST macro block in lines 185 to 190. You'd set "s" equal to what you want to parse and then use "result" after line 190.

Anonymous said...

Yep;

Thanks for your fast reply.
Well, I'm looking for a way to use boost features to get the same behaviour than muParser. That is to say Parse once, execute many times with an evaluation close to muParser's perf and I can't figure out how to do this. I benchmarked the evaluation time of this example : http://www.boost.org/doc/libs/1_35_0/libs/spirit/example/fundamental/ast_calc.cpp and it is very slow. Do you know more about existing libraries or frameworks to get math expressions to be parsed then evaluated in a faster way ?

Thanks again

Gaut

Rhys Ulerich said...

Hi Gaut,

With some work, this grammar could be turned into something like muParser but the design as-is isn't geared for it. I wanted to parse many different, constant expressions like one might read from an input file rather than to evaluate the same expression for multiple inputs. That is, parse-once-and-evaluate-once.

Depending on what you need, GNU libmatheval may be useful.

Best of luck finding what you're looking for,
Rhys

Anonymous said...

Ok thanks Rhys,

I've heard about this lib while browsing the web but haven't been further into details. I'm gonna have a more precise look at it.

Regards
Gauthier

Rhys Ulerich said...

Me again Gauthier,

http://code.google.com/p/mathpresso/ may possibly do what you want-- apparently the expression is compiled down to an intermediate stage for fast evaluation on given variable inputs.

- Rhys

Anonymous said...

Hello !

Why is there a problem, after having added the "e" constant, when using "exp" in the expression ?

Rhys Ulerich said...

Can you post a complete example with the way you've added the 'e' constant? I suspect, but need to check, that it is a conflict with the floating point parsing.

Anonymous said...

Hello, thanks for your reply, here is my code. Notethat everything works fine with 'e' if i remove 'exp' as well:

$ echo "ln(e)" |./parser
ln(e) = 1

However with both of them, this is an error:
$ echo "exp(0)" |./parser

compiled using clang++ -std=c++11 main.cc -o parser

http://pastebin.com/Rm7EMbcH

Rhys Ulerich said...

Interesting. I can reproduce what you see with the pastebin. Thank you.

I am able to get your example working for both e and exp by moving the "no_case[constant][_val = _1]" line within grammar() to the bottom of the primary statement. I believe this is telling the parser to prefer function invocation over constants. I will re-run my test suit with that change and update appropriately. Thank you.

Alan Chen said...

You've done the hard work, so let me add this tiny bit on how to use bind to remove the struct declaration:

using boost::spirit::qi::_val;
using boost::spirit::qi::_1;

m_Rules[eStart] = m_Rules[eExp][_val = _1];
boost::phoenix::function lazy_pow;
m_Rules[eExp] = m_Rules[eMulDiv][_val = _1] >> '^' >> m_Rules[eMulDiv][_val = boost::phoenix::bind(static_cast(&pow),_val,_1)];

Rhys Ulerich said...

Hi Alan,

Thanks for the help with removing the struct declaration, but I'm dense and not quite seeing how that plugs into the code. Would you be willing to provide a diff off the gist version of the code?

Thanks,
Rhys

Ruediger said...

The [_val = lazy_pow(_val, _1)] statement may be replaced with the rather long [_val = boost::phoenix::bind(static_cast<FPT (*)(FPT, FPT)>(&std::pow),_val,_1)]

BTW, not sure I understand why the static_cast is needed, shouldn't C++ be able to determine the function signature by itself ?

I am still experimenting with the other structs, but haven't found a working solution yet. Will let you know when I do.

Would it be possible to clarify, under what conditions this example may be used in my own code and mark the Gist-code accordingly ? Would this be the Boost license ?

In any case thanks for this great example!

Ruediger

Rhys Ulerich said...

The static_casts are necessary so the compiler can get the pointer to the appropriate overload of things like std::abs.

I have added a Boost License 1.0 notice to the header.

Rhys Ulerich said...

I just remembered that I use this under the MPL2 in a research project of mine. I've tweaked the header to reflect that.

Ruediger said...

Rhys,

thanks for the clarifications -- this is most appreciated.

From a first reading of the license, the MPL2 would fit most of my needs (in an OSS-project).

Somewhat problematic for my environment is the fact that the usage of the MPL2 would force me to introduce a third license into my code base, which I have so far successfully tried to avoid.

Would there be any chance to simply dual-license the GIST-snippet under the Boost v1.0 license and the MPL2?

Forgive me for asking -- it would simply save me from having to re-order quite a bit of code in order to "isolate" MPL-code in a single file.

In any case thanks,
Ruediger

Rhys Ulerich said...
This comment has been removed by the author.
Rhys Ulerich said...

Hi Ruediger,

I'd prefer not to dual license this as that would introduce more complexity into my project's licensing.

I suspect I've already written the supporting details you need to make this grammar useful in your project. I've spent a little time over the past two days making that code entirely standalone so that you can import my MPL2 source into your project without compromising the license of any of your logic. Perhaps this accomplishes what you desire.

Take a look at the files named suzerain/expr* under https://bitbucket.org/RhysU/suzerain/src/. suzerain/exprgrammar.hpp is essentially this blog post. suzerain/exprparse_impl.hpp is the supporting parsing/error handling/reporting code exercising that grammar. These may be the only two headers you need and should be directly usable for you.

In practice, I found repeatedly compiling those headers lengthened my build times considerably. I wrote suzerain/exprparse.hpp and suzerain/exprparse_{d,f}{str,char}.cpp so that I could pay once to compile this logic and then could repeatedly link it quite cheaply.

Lastly, if you happen to want this functionality within an Automake-based project, suzerain/Makefile.am in that source tree shows how to build a convenience library called libexprparse.la. Your code could simply LIBADD libexprparse.la and use the exprparse.hpp functions directly.

Hope that helps. Please let me know if you have any questions.

- Rhys

Subscribe Subscribe to The Return of Agent Zlerich