How to write a lexer and parser in JavaScript

Lexer and Parser are two important components of compiler. Lexer is responsible for tokenizing the source code, and parser will read the tokens to build an AST(Abstract Syntax Tree).

To implement Lexer and Parser in JavaScript, we need to understand how they work. The following is a simple example of Lexer and Parser:

Lexical analyzer: a function that reads the source code character by character, then decides if it has reached the end of line or not. If it has reached the end of line, then the function returns the token corresponding to that character. If not, it would go back to read another character from source code.

Syntax analyzer: A function that parses the tokens generated by lexical analyzer and builds an Abstract Syntax Tree (AST) from them.

Each programming language is defined by 3 distinct parts. These are the lexer, parser and the compiler. The Lexer breaks down a single string to individual tokens, or words. These are the grammatical words in a sentence or statement. The Parser then takes those individual words and puts them back together as statements, expressions or something else that can be executed by the computer. Finally, the compiler takes those statements and creates an object for a program to use later on. Today, I’m going to show you how to do that in JavaScript

Lexer

The lexer class has a method called getNextToken which is responsible for returning one token at a time.

const INTEGER = "INTEGER";
const OPERATOR = "OPERATOR";
const EOF = "EOF";
class Lexer{
    private input: string;
    private position: number;
    private currentChar: string|null;

    constructor(input: string){
        this.input = input;
        this.position = 0;
        this.currentChar = this.input[this.position];
    }

    private error(){
        throw new Error("Invalid character");
    }

    private advance(){
        this.position++;
        if(this.position > this.input.length - 1){
            this.currentChar = null;
        }else{
            this.currentChar = this.input[this.position];
        }
    }

    private skipWhitespace(){
        while(this.currentChar != null && this.currentChar == " "){
            this.advance();
        }
    }

    private integer(){
        let result = "";
        while(this.currentChar != null && this.isDigit(this.currentChar)){
            result += this.currentChar;
            this.advance();
        }
        return parseInt(result);
    }

    private isDigit(char: string){
        return char >= "0" && char <= "9";
    }

    private isOperator(char: string){
        return char == "+" || char == "-" || char == "*" || char == "/";
    }

    public getNextToken(){
        while(this.currentChar != null){
            if(this.currentChar == " "){
                this.skipWhitespace();
                continue;
            }
            if(this.isDigit(this.currentChar)){
                return new Token(INTEGER, this.integer());
            }
            if(this.isOperator(this.currentChar)){
                let token = new Token(OPERATOR, this.currentChar);
                this.advance();
                return token;
            }
            this.error();
        }
        return new Token(EOF, null);
 
    }

}

Token

The token class is responsible for creating tokens. It has a type and a value. The type can be INTEGER, OPERATOR or EOF (end-of-file).The value is the actual value of the token. If the type is INTEGER then the value is an integer, if the type is OPERATOR then the value is an operator and if the type is EOF then the value is null.

class Token{
    type: string;
    value: any;

    constructor(type: string, value: any){
        this.type = type;
        this.value = value;
    }

    toString(){
        return `Token(${this.type}, ${this.value})`;
    }
}

Interpreter


class Interpreter
{
    private lexer: Lexer;
    private currentToken: Token;

    constructor(lexer: Lexer){
        this.lexer = lexer;
        this.currentToken = this.lexer.getNextToken();
    }

    private error(){
        throw new Error("Invalid syntax");
    }

    private eat(tokenType: string){
        if(this.currentToken.type == tokenType){
            this.currentToken = this.lexer.getNextToken();
        }else{
            this.error();
        }
    }

    private factor(){
        let token = this.currentToken;
        this.eat(INTEGER);
        return token.value;
    }

    private term(){
        let result = this.factor();
        while(this.currentToken.type == OPERATOR && (this.currentToken.value == "*" || this.currentToken.value == "/")){
            let token = this.currentToken;
            if(token.value == "*"){
                this.eat(OPERATOR);
                result *= this.factor();
            }else if(token.value == "/"){
                this.eat(OPERATOR);
                result /= this.factor();
            }
        }
        return result;
    }

    public expr(){
        let result = this.term();
        while(this.currentToken.type == OPERATOR && (this.currentToken.value == "+" || this.currentToken.value == "-")){
            let token = this.currentToken;
            if(token.value == "+"){
                this.eat(OPERATOR);
                result += this.term();
            }else if(token.value == "-"){
                this.eat(OPERATOR);
                result -= this.term();
            }
        }
        return result;
    }
}

The interpreter class has a method called

  • expr which is responsible for parsing and interpreting the input. It calls the term method which is responsible for parsing and interpreting the multiplication and division. It calls the factor method which is responsible for parsing and interpreting the integers. The eat method is responsible for comparing the current token with the passed token and if they match then “eating” the current token and assigning the next token to the current token, otherwise raising an exception.

  • the interpreter class has a method called factor which is responsible for parsing and interpreting the integers. It gets the current token and checks if its type is INTEGER. If it is then it “eats” the current token and assigns the next token to the current token and returns the value of the current token. If the current token’s type is not INTEGER then it raises an exception

  • The interpreter class has a method called term which is responsible for parsing and interpreting the multiplication and division. It calls the factor method and gets the result. Then it checks if the current token’s type is OPERATOR and if its value is either * or /. If it is then it “eats” the current token and assigns the next token to the current token. Then it calls the factor method again and gets the result. Then it checks if the current token’s type is OPERATOR and if its value is either * or /. If it is then it "eats" the current token and assigns the next token to the current token. Then it calls the factor method again and gets the result.
    Then it checks if the current token’s type is OPERATOR and if its value is either * or /. If it is then it "eats" the current token and assigns the next token to the current token.

Main

The main function creates a lexer object and passes the input to it. Then it creates an interpreter object and passes the lexer object to it.
Then it calls the expr method of the interpreter object and prints the result.

function main(){
    let lexer = new Lexer("1+2*3");
    let interpreter = new Interpreter(lexer);
    let result = interpreter.expr();
    console.log(result); //7
}

main();

Post a Comment

Please do not post any spam link in the comment box😊

Previous Post Next Post

Blog ads

CodeGuru