Introduction to Parsing - 11

This post is part of the Introduction to parsing series.

In the last part we completed our preliminary work. This time, we’re finally modify our parser to use the new tree structure we’ve built.

The commit we will discuss in this post is here. The commit includes a lot of changes all at once, so we are going to go through it rather slowly.

First step: adapting variables

We have to modify the various methods in the parser that used to handle expressions, terms, factors and similar so that they now return a NodePtr, rather than evaluating immediately the source and return a double as they did before.

To see what we mean, let’s see how we are going to change evalNextVariable: we’re going to rewrite it from this:

double Parser::evalNextVariable() {
    // Match the variable name
    std::string variableName = getNextToken().getContent();
    advance();

    // Lookup its value
    auto it = variables_.find(variableName);
    if (it == variables_.end()) {
        throw UnknownVariableName(variableName);
    }
    return it->second;
}

to this:

NodePtr Parser::evalNextVariable() {
    // Match the variable name
    std::string variableName = getNextToken().getContent();
    advance();

    // Make a variable access node
    return NodePtr(new VariableNode(variableName));
}

We aren’t changing the parsing code; only what we do with what we have parsed. Instead of immediately getting the variable’s value and returning, we’re returning a new VariableNode that will access the variable’s value when eval will be called.

Next: factors

Similarly to before, we’re going to change evalNextFunctionCall from:

double Parser::evalNextFunctionCall() {
    // Match the function name and the open parenthesis
    std::string functionName = getNextToken().getContent();
    advance();
    match(TokenType::OPERATOR, "(", "an open parenthesis");

    // Is it a valid function?
    doubleToDoubleFunction f = lookupFunctionByName(functionName);

    // Eval its argument and match the closed parenthesis
    double argumentValue = evalNextExpression();
    match(TokenType::OPERATOR, ")", "a closed parenthesis");

    // Call the function!
    return f(argumentValue);
}

to:

NodePtr Parser::evalNextFunctionCall() {
    // Match the function name and the open parenthesis
    std::string functionName = getNextToken().getContent();
    advance();
    match(TokenType::OPERATOR, "(", "an open parenthesis");

    // Parse its argument as a node and match the closed parenthesis
    NodePtr argument = getNextExpressionNode();
    match(TokenType::OPERATOR, ")", "a closed parenthesis");

    // Make a function call node
    return NodePtr(new FunctionCallNode(functionName, argument));
}

Notice how in the previous version we used to evaluate immediately the arguments of the function by calling evalNextExpression, while now we are calling the new method getNextExpressionNode to parse an expression as a NodePtr. The evaluation of the function’s argument is now delayed until the function call node is actually evaluated.

Let’s continue and see how evalNextParenthesisFactor changes. This is the old version:

double Parser::evalNextParenthesisFactor()
{
    // Match the '('; parse an expression; then match the ')'
    match(TokenType::OPERATOR, "(", "an open parenthesis");
    double value = evalNextExpression();
    match(TokenType::OPERATOR, ")", "a closed parenthesis");

    return value;
}

and this is the new one:

NodePtr Parser::evalNextParenthesisFactor()
{
    // Match the '('; parse an expression; then match the ')'
    match(TokenType::OPERATOR, "(", "an open parenthesis");
    NodePtr node = getNextExpressionNode();
    match(TokenType::OPERATOR, ")", "a closed parenthesis");

    return node;
}

Given that parenthesis are implied by the tree structure, there’s no special need of a Node subclass for them.

Finally, we have to change the first lines of evalNextFactor to handle numbers, from:

double Parser::evalNextFactor()
{
    if (getNextToken().getTokenType() == TokenType::NUMBER) {
        double value = atof(getNextToken().getContent().c_str());
        advance();
        return value;

to:

NodePtr Parser::evalNextFactor()
{
    if (getNextToken().getTokenType() == TokenType::NUMBER) {
        double value = atof(getNextToken().getContent().c_str());
        advance();
        return NodePtr(new NumberNode(value));

Terms and expressions

The changes to terms and expressions parsing are very similar to each other, so we will show only the terms here . The code was changed from:

double Parser::evalNextTerm()
{
    // First handle numbers and parenthesis
    double value = evalNextFactor();

    // Next handle multiplications and divisions
    while (hasNextToken() && getNextToken().getTokenType() == TokenType::OPERATOR) {
        if (getNextToken().getContent() == "*") {
            advance();
            value *= evalNextFactor();
        } else if (getNextToken().getContent() == "/") {
            advance();
            value /= evalNextFactor();
        } else {
            // An operator, but not '*' or '/': let the caller handle it.
            break;
        }
    }

    return value;
}

to:

NodePtr Parser::evalNextTerm()
{
    // First handle numbers and parenthesis
    NodePtr node = evalNextFactor();

    // Next handle multiplications and divisions
    while (hasNextToken() && getNextToken().getTokenType() == TokenType::OPERATOR) {
        if (getNextToken().getContent() == "*") {
            advance();
            node = NodePtr(new MultiplicationNode(node, evalNextFactor()));
        } else if (getNextToken().getContent() == "/") {
            advance();
            node = NodePtr(new DivisionNode(node, evalNextFactor()));
        } else {
            // An operator, but not '*' or '/': let the caller handle it.
            break;
        }
    }

    return node;
}

Basically, instead of immediately evaluating the multiplications and divisions, we build a new node whose left child is what we have parsed so far, and the right child is the next factor.

Let’s say we are parsing 1 * 2 / 3. At the first iteration first we’re going to build a NumberNode for 1. Next, we are going to replace node with a MultiplicationNode, whose left child is the NumberNode(1) and the right child is the next factor, so a NumberNode(2). Finally, at the last iteration, node will refer to a new DivisionNode, whose left child will be the old MultiplicationNode and the right child will be a new NumberNode(3):

Last changes

We have seen how we have changed most methods to return a NodePtr rather than evaluate the expression immediately. However, we also need to actually evaluate the nodes we build. This is done with a new method:

double Parser::evalNode(NodePtr node)
{
    EvaluationContext evaluationContext(functions_, variables_);
    return node->eval(evaluationContext);
}

With this helper method, parseExpression changes from:

void Parser::parseExpression()
{
    double value = evalNextExpression();
    ostream_ << value << std::endl;
}

to:

void Parser::parseExpression()
{
    NodePtr node = getNextExpressionNode();
    ostream_ << evalNode(node) << std::endl;
}

A similar change has to be done in parseAssignment, since we only need to store variables’s values and not the expression used to evaluate them:

// The new parseAssignment
void Parser::parseAssignment()
{
    // Match variable name
    if (!hasNextToken() || getNextToken().getTokenType() != TokenType::IDENTIFIER){
        throw InvalidInputException("Found an unexpected token: " + getNextToken().getContent());
    }
    std::string variableName = getNextToken().getContent();
    advance();

    // Match =
    match(TokenType::OPERATOR, "=", "the assigment operator =");

    // Get the expression as a node, evaluate it and save the variable value
    NodePtr node = getNextExpressionNode();
    variables_[variableName] = evalNode(node);
};

After these changes, with just a trivial change to the unit tests (they now have to call evalNode and getNextExpressionNode rather than the old evalNextExpression we see that our parser works as expected!

Conclusions

The groundwork is complete and our parser has been rewritten to use our nodes. In the next two parts, we’re going to finally let the user define new functions.