This report documents the design, grammar definition, custom visitor implementation, and execution verification of the ANTLR HTML Structure Parser project, modeled after a responsive Student Management System interface.
The target application is Stellar, a modern, responsive Student Management System dashboard.
The page utilizes a blurred glassmorphism theme (backdrop-filter) with modern layouts:
- Header (
#dashboard-header): Contains the brand identifier (h1) and a real-time filter search bar. - Stats Grid (
#statsGrid): Displays key performance indicators using three glass-styled cards:- Total Students (
#cardTotal) - Average GPA (
#cardGpa) - Honors Students (
#cardHonors)
- Total Students (
- Content Grid (
.content-layout): A responsive two-column grid layout containing:- Add Student Form Panel (
#addStudentSection): A form with standard text, email, and numeric inputs to register students. - Student Table Panel (
#studentListSection): A tabular representation of registered student profiles, including avatar circles, GPA color badges, and action buttons (Edit/Delete).
- Add Student Form Panel (
To capture this hierarchical structure, we defined a custom domain-specific language (DSL) that represents HTML elements in a clean, selector-like syntax.
Below is the complete ANTLR v4 grammar:
grammar HTMLStructure;
// Parser Rules
htmlDoc: element* EOF;
element: tagElement
| textElement
;
tagElement: IDENTIFIER
( '#' id=IDENTIFIER )?
( '.' className+=IDENTIFIER )*
( '[' attributeList ']' )?
( '{' element* '}' )? ;
attributeList: attribute ( ',' attribute )* ;
attribute: name=IDENTIFIER '=' value=STRING ;
textElement: value=STRING ;
// Lexer Rules
IDENTIFIER: [a-zA-Z_][a-zA-Z0-9\-_]* ;
STRING: '"' ( '\\"' | ~[\\"\r\n] )* '"'
| '\'' ( '\\\'' | ~[\\'\r\n] )* '\''
;
WS: [ \t\r\n]+ -> skip ;
LINE_COMMENT: '//' ~[\r\n]* -> skip ;
BLOCK_COMMENT: '/*' .*? '*/' -> skip ;- CSS-Style Selectors: The
tagElementrule matches selectors likediv#container.card[attr="val"] { ... }. - Labeled Identifiers: We labeled the parser variables (
id=IDENTIFIER,className+=IDENTIFIER,name=IDENTIFIER,value=STRING). This instructs ANTLR to expose dedicated getters/lists in the generated contexts (e.g.ctx.idandctx.className), avoiding ambiguity with the tag name. - Optional Sub-blocks: Both the attributes list
[...]and the children block{...}are optional, allowing leaf tags likei.fa-solidto be cleanly declared without brackets or braces.
We converted the structural model of the student_app.html mockup into a DSL document matching the grammar:
div#dashboard-container.dashboard-container {
header#dashboard-header {
div.brand-section {
h1 {
i.fa-solid.fa-graduation-cap
"Stellar"
}
p { "Student Performance Management Dashboard" }
}
div.search-bar {
i.fa-solid.fa-magnifying-glass
input#searchInput[placeholder="Search students by name or email...", type="text"]
}
}
section#statsGrid.stats-grid {
...
}
div.content-layout {
section#addStudentSection.panel {
form#studentForm {
div.form-group {
label[for="studentName"] { "Full Name" }
input#studentName.form-control[placeholder="e.g. Alice Smith", required="true"]
}
...
}
}
...
}
}
The Java implementation parses the DSL input, constructs a parse tree, walks it using the Visitor pattern, prints the hierarchical tree to the console, and collects node stats.
The driver configures the ANTLR input stream, initializes the lexer and parser, registers custom error handling, and kicks off the tree walk.
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;
import java.io.IOException;
public class Main {
public static void main(String[] args) {
String inputFile = "sample_input.txt";
if (args.length > 0) {
inputFile = args[0];
}
System.out.println("Processing input file: " + inputFile);
try {
CharStream input = CharStreams.fromFileName(inputFile);
HTMLStructureLexer lexer = new HTMLStructureLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
HTMLStructureParser parser = new HTMLStructureParser(tokens);
parser.removeErrorListeners();
parser.addErrorListener(new BaseErrorListener() {
@Override
public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol,
int line, int charPositionInLine, String msg,
RecognitionException e) {
System.err.println("Syntax Error at line " + line + ":" + charPositionInLine + " - " + msg);
}
});
ParseTree tree = parser.htmlDoc();
if (parser.getNumberOfSyntaxErrors() > 0) {
System.err.println("Parsing finished with errors. Exiting.");
System.exit(1);
}
HTMLStructureBaseVisitorExtended visitor = new HTMLStructureBaseVisitorExtended();
visitor.visit(tree);
} catch (IOException e) {
System.err.println("Error reading input file: " + e.getMessage());
}
}
}This visitor extends the generated HTMLStructureBaseVisitor<Void> class to perform visual formatting and track counts.
import java.util.List;
import java.util.ArrayList;
import java.util.stream.Collectors;
import org.antlr.v4.runtime.Token;
public class HTMLStructureBaseVisitorExtended extends HTMLStructureBaseVisitor<Void> {
private int indentLevel = 0;
private int totalNodes = 0;
private int tagElementCount = 0;
private int textElementCount = 0;
private String getIndent() {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < indentLevel; i++) {
sb.append(" ");
}
return sb.toString();
}
@Override
public Void visitHtmlDoc(HTMLStructureParser.HtmlDocContext ctx) {
System.out.println("--- Starting Parse Tree Walk ---");
Void result = super.visitHtmlDoc(ctx);
System.out.println("--- Parse Tree Walk Completed ---");
System.out.println("\nStatistics Summary:");
System.out.println(" Total Nodes Processed: " + totalNodes);
System.out.println(" Tag Elements Count: " + tagElementCount);
System.out.println(" Text Elements Count: " + textElementCount);
return result;
}
@Override
public Void visitTagElement(HTMLStructureParser.TagElementContext ctx) {
totalNodes++;
tagElementCount++;
String tagName = ctx.IDENTIFIER(0).getText();
String idName = ctx.id != null ? "#" + ctx.id.getText() : "";
// Collect class names
String classStr = "";
if (ctx.className != null && !ctx.className.isEmpty()) {
classStr = ctx.className.stream()
.map(token -> "." + token.getText())
.collect(Collectors.joining());
}
// Collect attributes
String attrStr = "";
if (ctx.attributeList() != null && ctx.attributeList().attribute() != null) {
List<String> attrs = new ArrayList<>();
for (HTMLStructureParser.AttributeContext attrCtx : ctx.attributeList().attribute()) {
attrs.add(attrCtx.name.getText() + "=" + attrCtx.value.getText());
}
attrStr = "[" + String.join(", ", attrs) + "]";
}
System.out.println(getIndent() + "[Tag] " + tagName + idName + classStr + (attrStr.isEmpty() ? "" : " " + attrStr));
indentLevel++;
// Visit children
if (ctx.element() != null) {
for (HTMLStructureParser.ElementContext child : ctx.element()) {
visit(child);
}
}
indentLevel--;
return null;
}
@Override
public Void visitTextElement(HTMLStructureParser.TextElementContext ctx) {
totalNodes++;
textElementCount++;
String text = ctx.value.getText();
// Remove surrounding quotes
if (text.startsWith("\"") && text.endsWith("\"") && text.length() >= 2) {
text = text.substring(1, text.length() - 1);
} else if (text.startsWith("'") && text.endsWith("'") && text.length() >= 2) {
text = text.substring(1, text.length() - 1);
}
System.out.println(getIndent() + "[Text] \"" + text + "\"");
return null;
}
}Follow these steps to run the parser:
Use the downloaded ANTLR jar to generate Java classes:
java -jar antlr-4.13.2-complete.jar -visitor HTMLStructure.g4Compile the driver, custom visitor, and all ANTLR-generated source files:
javac -cp ".;antlr-4.13.2-complete.jar" *.javaRun the parser program passing the sample input document:
java -cp ".;antlr-4.13.2-complete.jar" Main sample_input.txtThe parser runs and walks the parse tree successfully, producing:
Processing input file: sample_input.txt
--- Starting Parse Tree Walk ---
[Tag] div#dashboard-container.dashboard-container
[Tag] header#dashboard-header
[Tag] div.brand-section
[Tag] h1
[Tag] i.fa-solid.fa-graduation-cap
[Text] "Stellar"
[Tag] p
[Text] "Student Performance Management Dashboard"
[Tag] div.search-bar
[Tag] i.fa-solid.fa-magnifying-glass
[Tag] input#searchInput [placeholder="Search students by name or email...", type="text"]
[Tag] section#statsGrid.stats-grid
[Tag] div#cardTotal.stat-card
[Tag] div.stat-info
[Tag] h3
[Text] "Total Students"
[Tag] div#statTotalCount.value
[Text] "3"
[Tag] div.trend.up
[Tag] i.fa-solid.fa-arrow-up
[Text] "+1 New class"
...
--- Parse Tree Walk Completed ---
Statistics Summary:
Total Nodes Processed: 121
Tag Elements Count: 92
Text Elements Count: 29
-
Resolving List vs Element in Grammar Actions: In ANTLR parser rules, matching multiple identifiers like
className+=IDENTIFIERmakes the context class generate aList<Token>. When querying the primary node's identifier (e.g. tag name), usingctx.IDENTIFIER()yields the whole list of matched identifiers instead of the single tag name token. The first token must be explicitly indexed viactx.IDENTIFIER(0). -
Context Labels (
id=IDENTIFIER): Explicitly labeling optional parser fields (e.g.id=IDENTIFIER) distinguishes them from collection arrays (className+=IDENTIFIER) and makes custom visitors highly readable and easy to develop. -
Visitor vs Listener Pattern: The Visitor pattern is perfect for visual printing because it allows full control over when child nodes are visited. Indentation levels can be easily tracked by incrementing a counter before visiting children and decrementing it afterwards. The Listener pattern, while simpler, runs automatically and makes passing structured layout details or maintaining indentation hierarchies more complex.