Create a Text Parser in C# with ANTLR

This post will show you how to write a simple DSL in C# using ANTLR.

What is DSL?

A domain-specific language is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language, which is broadly applicable across domains.

What is ANTLR?

In computer-based language recognition, ANTLR, or ANother Tool for Language Recognition, is a parser generator that uses LL for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set, first developed in 1989, and is under active development

What are we going to build?

In this article, I will show you how to create a DSL that converts a sentence into C# POCO classes that you can serialize and save into the database.

book a hotel room for Santosh Singh and 3 guests on 23-SEP-2013 00:12

We are going to parse the above statement into C# POCO classes by using ANLTR, and the output will be as shown below.

{"Time":"2013-09-23T00:12:00","Person":{"FirstName":"Santosh","LastName":"Singh","NumberOfGuests":3}}

Prerequisite

  • ANTLR
  • Visual Studio Code with ANTLR extensions
    Before writing, grammar let’s first create our POCO classes that we want to populate from our DSL

Booking.cs

using System;

public partial class Booking
{
    public DateTime Time { get; set; }
    public Person Person { get; set; }
}

Person.cs

public class Person{

public string FirstName { get; set; }
public string LastName { get; set; }
public int NumberOfGuests { get; set; }
    public override string ToString()
    {
        return $"{FirstName}-{LastName}-{NumberOfGuests}";
    }
}

ANTLR Grammar

The first thing we will do that write the grammar file for the above statement. Below is the grammar for our DSL. ANTLR grammar consists of two parts

  • Parser
  • Lexer

In our grammar parser, are

  • booking
  • time
  • person

Booking is the start of parsing.

Here I am using Actions and Attribute instead of Visitor. Actions are blocks of text written in the target language and enclosed in curly braces. The recognizer triggers them according to their locations within the grammar. For example, the following rule emits a Booking model after the parser has seen a valid declaration:

booking
	returns[Booking b]:
	'book' 'a' 'hotel' 'room' 'for' person 'guests' 'on' time {
      $b=new Booking();
      $b.Time=$time.t;
      $b.Person=$person.p;

    };

Grammar

grammar Booking;

@lexer::header {
    using System;
}

booking
	returns[Booking b]:
	'book' 'a' 'hotel' 'room' 'for' person 'guests' 'on' time {
      $b=new Booking();
      $b.Time=$time.t;
      $b.Person=$person.p;

    };
time
	returns[DateTime t]:
	d = datetime {
    Console.WriteLine($d.text);
    $t=DateTime.Parse($d.text);

};
person
	returns[Person p]:
	f = firstName l = lastName 'and' n = numberOfGuests {

       $p=new Person();
       $p.FirstName=$f.text;
       $p.LastName=$l.text;
       $p.NumberOfGuests=int.Parse($n.text);

   };

firstName: STRING;
lastName: STRING;
numberOfGuests: NUMBER;
datetime:
	NUMBER NUMBER SEPARATOR MONTH SEPARATOR YEAR NUMBER NUMBER COLON NUMBER NUMBER;
YEAR: NUMBER NUMBER NUMBER NUMBER;

NUMBER: [0-9];
MONTH:
	JAN
	| FEB
	| MAR
	| APR
	| MAY
	| JUN
	| JUL
	| AUG
	| SEP
	| OCT
	| NOV
	| DEC;

STRING: [a-zA-Z][a-zA-Z]+;
JAN: [Jj][Aa][Nn];
FEB: [Ff][Ee][Bb];
MAR: [Mm][Aa][Rr];
APR: [Aa][Pp][Rr];
MAY: [Mm][Aa][Yy];
JUN: [Jj][Uu][Nn];
JUL: [Jj][Uu][Ll];
AUG: [Aa][Uu][Gg];
SEP: [Ss][Ee][Pp];
OCT: [Oo][Cc][Tt];
NOV: [Nn][Oo][Vv];
DEC: [Dd][Ee][Cc];
SEPARATOR: '-';
COLON: ':';

WS: (' ' | '\r' | '\n' | '\t') -> channel(HIDDEN);

Parse Tree

Now inside the main method, write the following code.

Assuming you have generated the parser and lexer code using ANTLR.

Program.cs

using System;
using Antlr4.Runtime;
using Newtonsoft.Json;

namespace antlraction
{
    class Program
    {
        static void Main(string[] args)
        {
            var input = "book a hotel room  for Santosh Singh and 3 guests on 23-SEP-2013 00:12";

            var charStream = new AntlrInputStream(input);
            var lexer = new BookingLexer(charStream);
            var tokens = new CommonTokenStream(lexer);
            var parser = new BookingParser(tokens);

            var t = parser.booking().b;
            System.Console.WriteLine(JsonConvert.SerializeObject(t));
        }
    }
}

Creating External DSLs using ANTLR and C#

{"Time":"2013-09-23T00:12:00","Person":{"FirstName":"Santosh","LastName":"Singh","NumberOfGuests":3}}

Post a Comment

Please do not post any spam link in the comment box😊

Previous Post Next Post

Blog ads

CodeGuru