Skip to content

Amoghhosamane/ANTLRparse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ANTLR HTML Structure Parser - Project Report

This report documents the design, grammar definition, custom visitor implementation, and execution verification of the ANTLR HTML Structure Parser project, modeled after a responsive Student Management System interface.


1. Project Overview & Structure Analysis

The target application is Stellar, a modern, responsive Student Management System dashboard.

HTML Interface Architecture (student_app.html)

The page utilizes a blurred glassmorphism theme (backdrop-filter) with modern layouts:

  1. Header (#dashboard-header): Contains the brand identifier (h1) and a real-time filter search bar.
  2. Stats Grid (#statsGrid): Displays key performance indicators using three glass-styled cards:
    • Total Students (#cardTotal)
    • Average GPA (#cardGpa)
    • Honors Students (#cardHonors)
  3. Content Grid (.content-layout): A responsive two-column grid layout containing:
    • Add Student Form Panel (#addStudentSection): A form with standard text, email, and numeric inputs to register students.
    • Student Table Panel (#studentListSection): A tabular representation of registered student profiles, including avatar circles, GPA color badges, and action buttons (Edit/Delete).

2. ANTLR Grammar Design (HTMLStructure.g4)

To capture this hierarchical structure, we defined a custom domain-specific language (DSL) that represents HTML elements in a clean, selector-like syntax.

Below is the complete ANTLR v4 grammar:

grammar HTMLStructure;

// Parser Rules
htmlDoc: element* EOF;

element: tagElement
       | textElement
       ;

tagElement: IDENTIFIER 
            ( '#' id=IDENTIFIER )? 
            ( '.' className+=IDENTIFIER )* 
            ( '[' attributeList ']' )? 
            ( '{' element* '}' )? ;

attributeList: attribute ( ',' attribute )* ;

attribute: name=IDENTIFIER '=' value=STRING ;

textElement: value=STRING ;

// Lexer Rules
IDENTIFIER: [a-zA-Z_][a-zA-Z0-9\-_]* ;

STRING: '"' ( '\\"' | ~[\\"\r\n] )* '"'
      | '\'' ( '\\\'' | ~[\\'\r\n] )* '\''
      ;

WS: [ \t\r\n]+ -> skip ;

LINE_COMMENT: '//' ~[\r\n]* -> skip ;
BLOCK_COMMENT: '/*' .*? '*/' -> skip ;

Key Grammar Design Choices

  • CSS-Style Selectors: The tagElement rule matches selectors like div#container.card[attr="val"] { ... }.
  • Labeled Identifiers: We labeled the parser variables (id=IDENTIFIER, className+=IDENTIFIER, name=IDENTIFIER, value=STRING). This instructs ANTLR to expose dedicated getters/lists in the generated contexts (e.g. ctx.id and ctx.className), avoiding ambiguity with the tag name.
  • Optional Sub-blocks: Both the attributes list [...] and the children block {...} are optional, allowing leaf tags like i.fa-solid to be cleanly declared without brackets or braces.

3. Custom Input Representation (sample_input.txt)

We converted the structural model of the student_app.html mockup into a DSL document matching the grammar:

div#dashboard-container.dashboard-container {
    header#dashboard-header {
        div.brand-section {
            h1 {
                i.fa-solid.fa-graduation-cap
                "Stellar"
            }
            p { "Student Performance Management Dashboard" }
        }
        div.search-bar {
            i.fa-solid.fa-magnifying-glass
            input#searchInput[placeholder="Search students by name or email...", type="text"]
        }
    }
    
    section#statsGrid.stats-grid {
        ...
    }
    
    div.content-layout {
        section#addStudentSection.panel {
            form#studentForm {
                div.form-group {
                    label[for="studentName"] { "Full Name" }
                    input#studentName.form-control[placeholder="e.g. Alice Smith", required="true"]
                }
                ...
            }
        }
        ...
    }
}

4. Java Implementation

The Java implementation parses the DSL input, constructs a parse tree, walks it using the Visitor pattern, prints the hierarchical tree to the console, and collects node stats.

A. Driver Class (Main.java)

The driver configures the ANTLR input stream, initializes the lexer and parser, registers custom error handling, and kicks off the tree walk.

import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;
import java.io.IOException;

public class Main {
    public static void main(String[] args) {
        String inputFile = "sample_input.txt";
        if (args.length > 0) {
            inputFile = args[0];
        }

        System.out.println("Processing input file: " + inputFile);

        try {
            CharStream input = CharStreams.fromFileName(inputFile);
            HTMLStructureLexer lexer = new HTMLStructureLexer(input);
            CommonTokenStream tokens = new CommonTokenStream(lexer);
            HTMLStructureParser parser = new HTMLStructureParser(tokens);

            parser.removeErrorListeners();
            parser.addErrorListener(new BaseErrorListener() {
                @Override
                public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol,
                                        int line, int charPositionInLine, String msg,
                                        RecognitionException e) {
                    System.err.println("Syntax Error at line " + line + ":" + charPositionInLine + " - " + msg);
                }
            });

            ParseTree tree = parser.htmlDoc();

            if (parser.getNumberOfSyntaxErrors() > 0) {
                System.err.println("Parsing finished with errors. Exiting.");
                System.exit(1);
            }

            HTMLStructureBaseVisitorExtended visitor = new HTMLStructureBaseVisitorExtended();
            visitor.visit(tree);

        } catch (IOException e) {
            System.err.println("Error reading input file: " + e.getMessage());
        }
    }
}

B. Custom Visitor (HTMLStructureBaseVisitorExtended.java)

This visitor extends the generated HTMLStructureBaseVisitor<Void> class to perform visual formatting and track counts.

import java.util.List;
import java.util.ArrayList;
import java.util.stream.Collectors;
import org.antlr.v4.runtime.Token;

public class HTMLStructureBaseVisitorExtended extends HTMLStructureBaseVisitor<Void> {
    private int indentLevel = 0;
    private int totalNodes = 0;
    private int tagElementCount = 0;
    private int textElementCount = 0;

    private String getIndent() {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < indentLevel; i++) {
            sb.append("  ");
        }
        return sb.toString();
    }

    @Override
    public Void visitHtmlDoc(HTMLStructureParser.HtmlDocContext ctx) {
        System.out.println("--- Starting Parse Tree Walk ---");
        Void result = super.visitHtmlDoc(ctx);
        System.out.println("--- Parse Tree Walk Completed ---");
        System.out.println("\nStatistics Summary:");
        System.out.println("  Total Nodes Processed: " + totalNodes);
        System.out.println("  Tag Elements Count: " + tagElementCount);
        System.out.println("  Text Elements Count: " + textElementCount);
        return result;
    }

    @Override
    public Void visitTagElement(HTMLStructureParser.TagElementContext ctx) {
        totalNodes++;
        tagElementCount++;

        String tagName = ctx.IDENTIFIER(0).getText();
        String idName = ctx.id != null ? "#" + ctx.id.getText() : "";

        // Collect class names
        String classStr = "";
        if (ctx.className != null && !ctx.className.isEmpty()) {
            classStr = ctx.className.stream()
                    .map(token -> "." + token.getText())
                    .collect(Collectors.joining());
        }

        // Collect attributes
        String attrStr = "";
        if (ctx.attributeList() != null && ctx.attributeList().attribute() != null) {
            List<String> attrs = new ArrayList<>();
            for (HTMLStructureParser.AttributeContext attrCtx : ctx.attributeList().attribute()) {
                attrs.add(attrCtx.name.getText() + "=" + attrCtx.value.getText());
            }
            attrStr = "[" + String.join(", ", attrs) + "]";
        }

        System.out.println(getIndent() + "[Tag] " + tagName + idName + classStr + (attrStr.isEmpty() ? "" : " " + attrStr));

        indentLevel++;
        // Visit children
        if (ctx.element() != null) {
            for (HTMLStructureParser.ElementContext child : ctx.element()) {
                visit(child);
            }
        }
        indentLevel--;

        return null;
    }

    @Override
    public Void visitTextElement(HTMLStructureParser.TextElementContext ctx) {
        totalNodes++;
        textElementCount++;
        String text = ctx.value.getText();
        // Remove surrounding quotes
        if (text.startsWith("\"") && text.endsWith("\"") && text.length() >= 2) {
            text = text.substring(1, text.length() - 1);
        } else if (text.startsWith("'") && text.endsWith("'") && text.length() >= 2) {
            text = text.substring(1, text.length() - 1);
        }
        System.out.println(getIndent() + "[Text] \"" + text + "\"");
        return null;
    }
}

5. Build, Execution, and Verification

Follow these steps to run the parser:

Step 1: Generate Parser Files

Use the downloaded ANTLR jar to generate Java classes:

java -jar antlr-4.13.2-complete.jar -visitor HTMLStructure.g4

Step 2: Compile All Sources

Compile the driver, custom visitor, and all ANTLR-generated source files:

javac -cp ".;antlr-4.13.2-complete.jar" *.java

Step 3: Run the Application

Run the parser program passing the sample input document:

java -cp ".;antlr-4.13.2-complete.jar" Main sample_input.txt

Verification Output

The parser runs and walks the parse tree successfully, producing:

Processing input file: sample_input.txt
--- Starting Parse Tree Walk ---
[Tag] div#dashboard-container.dashboard-container
  [Tag] header#dashboard-header
    [Tag] div.brand-section
      [Tag] h1
        [Tag] i.fa-solid.fa-graduation-cap
        [Text] "Stellar"
      [Tag] p
        [Text] "Student Performance Management Dashboard"
    [Tag] div.search-bar
      [Tag] i.fa-solid.fa-magnifying-glass
      [Tag] input#searchInput [placeholder="Search students by name or email...", type="text"]
  [Tag] section#statsGrid.stats-grid
    [Tag] div#cardTotal.stat-card
      [Tag] div.stat-info
        [Tag] h3
          [Text] "Total Students"
        [Tag] div#statTotalCount.value
          [Text] "3"
        [Tag] div.trend.up
          [Tag] i.fa-solid.fa-arrow-up
          [Text] "+1 New class"
...
--- Parse Tree Walk Completed ---

Statistics Summary:
  Total Nodes Processed: 121
  Tag Elements Count: 92
  Text Elements Count: 29

6. Lessons Learned & Best Practices

  1. Resolving List vs Element in Grammar Actions: In ANTLR parser rules, matching multiple identifiers like className+=IDENTIFIER makes the context class generate a List<Token>. When querying the primary node's identifier (e.g. tag name), using ctx.IDENTIFIER() yields the whole list of matched identifiers instead of the single tag name token. The first token must be explicitly indexed via ctx.IDENTIFIER(0).

  2. Context Labels (id=IDENTIFIER): Explicitly labeling optional parser fields (e.g. id=IDENTIFIER) distinguishes them from collection arrays (className+=IDENTIFIER) and makes custom visitors highly readable and easy to develop.

  3. Visitor vs Listener Pattern: The Visitor pattern is perfect for visual printing because it allows full control over when child nodes are visited. Indentation levels can be easily tracked by incrementing a counter before visiting children and decrementing it afterwards. The Listener pattern, while simpler, runs automatically and makes passing structured layout details or maintaining indentation hierarchies more complex.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors