A Common Lisp XML reader, writer, and custom parser.
(ql:quickload "io.github.cl-sdk.xml")parse-xml accepts a string and an optional :handler keyword argument.
- Default behaviour — when no handler is given,
parse-xmlreturns anxml-documentbuilt by the built-indom-builderhandler (fully backward-compatible). - SAX behaviour — when a custom handler is supplied, the parser fires
events on it and returns whatever
end-documentreturns.
(defvar *doc*
(io.github.cl-sdk.xml:parse-xml "<?xml version=\"1.0\"?>
<!-- preamble -->
<root>
<item id=\"1\">hello & world</item>
<!-- note -->
<![CDATA[literal <text>]]>
<?app instruction?>
</root>"))Provide a subclass of sax-handler and pass an instance as :handler to
parse-xml. Specialize only the event methods you care about; unspecialized
methods are no-ops.
(defclass my-handler (io.github.cl-sdk.xml:sax-handler) ())
(defmethod io.github.cl-sdk.xml:start-element ((h my-handler) tag attributes)
(format t "open ~a ~a~%" tag attributes))
(defmethod io.github.cl-sdk.xml:end-element ((h my-handler) tag)
(format t "close ~a~%" tag))
(defmethod io.github.cl-sdk.xml:end-document ((h my-handler))
:done)
(io.github.cl-sdk.xml:parse-xml "<root><child /></root>" :handler (make-instance 'my-handler))
;; open root nil
;; open child nil
;; close child
;; close root
;; => :done| Generic function | When called |
|---|---|
(start-document handler) |
once, before any other event |
(end-document handler) |
once, after all events; return value is parse-xml's result |
(start-element handler tag attributes) |
opening / self-closing tag |
(end-element handler tag) |
closing / self-closing tag |
(characters handler text) |
character data (entity refs already expanded) |
(comment handler data) |
<!-- … --> comment |
(processing-instruction handler target data) |
<?target data?> PI |
(cdata-section handler data) |
<![CDATA[…]]> section |
The top-level result of parse-xml.
| Accessor | Returns |
|---|---|
xml-document-prolog |
list of xml-comment / xml-pi nodes before the root element |
xml-document-root |
the root xml-node |
(io.github.cl-sdk.xml:xml-document-prolog *doc*)
;; => (#<xml-pi "xml" …> #<xml-comment " preamble ">)
(io.github.cl-sdk.xml:xml-node-tag (io.github.cl-sdk.xml:xml-document-root *doc*))
;; => "root"| Accessor | Returns |
|---|---|
xml-node-tag |
element name as a string |
xml-node-attributes |
alist of (name . value) string pairs |
xml-node-children |
list of child nodes (see node types below) |
(let* ((root (io.github.cl-sdk.xml:xml-document-root *doc*))
(item (first (io.github.cl-sdk.xml:xml-node-children root))))
(io.github.cl-sdk.xml:xml-node-tag item) ; => "item"
(io.github.cl-sdk.xml:xml-node-attributes item) ; => (("id" . "1"))
(io.github.cl-sdk.xml:xml-node-children item)) ; => ("hello & world")Represents a <!-- … --> comment.
| Accessor | Returns |
|---|---|
xml-comment-data |
comment body as a string |
Represents a <?target data?> processing instruction.
| Accessor | Returns |
|---|---|
xml-pi-target |
target name as a string |
xml-pi-data |
data string (may be empty) |
Represents a <![CDATA[…]]> section.
| Accessor | Returns |
|---|---|
xml-cdata-data |
literal content as a string |
Each child of an xml-node is one of:
| Type | Produced by |
|---|---|
xml-node |
<child …> / <child /> |
xml-comment |
<!-- … --> |
xml-pi |
<?target data?> |
xml-cdata |
<![CDATA[…]]> |
string |
character data / entity references |
Whitespace-only character data between elements is discarded.
- §2.3 Names —
NameStartChar/NameCharUnicode ranges enforced - §2.3 / §3.3.3 Attribute values — bare
<is an error; entity/character references expanded - §2.5 Comments —
--inside a comment body is an error - §2.7 CDATA sections — content is literal (markup characters not interpreted)
- §2.8 Prolog — XML declaration and DOCTYPE handled; prolog comments/PIs preserved
- §3.1 Attributes — duplicate attribute names are an error
- §4.6 References —
&<>"'&#N;&#xN;expanded
io.github.cl-sdk.xml is a hand-written recursive-descent parser implemented in Common Lisp. It targets the specifications listed below.
- Extensible Markup Language (XML) 1.0 — the core grammar and well-formedness rules that govern parsing, character data, entity references, comments, CDATA sections, processing instructions, and the document prolog.
- XML Schema Part 1: Structures — the schema-definition language used as a reference for element and attribute declarations, content models, and type hierarchies.
Unlicense