Description
Currently the <?xml ... ?> prolog declaration is not used by the parser (see #774).
This is a problem because certain control characters can only be escaped in XML 1.1 by using numeric character references (NCR). In version 1.0 this is invalid XML.
I can set htmlEntities: true to allow parsing NCR, but that also automatically parses other HTML entities which is non-conformant and undesired.
I'd also need to manually extract the version first, before parsing the XML string, which I believe is something that the library should do for me.
My proposal:
- A new config flag
parseDeclaration (default to true), which reads the XML declaration if it exists, and uses it to automatically set some parser options.
- New possible value for
htmlEntities: 'ncr' which only enables the NCR parsing. This is what'll be used by parseDeclaration when it encounters version 1.1 XML.
Input
<?xml version="1.1" encoding="UTF-8" ?>
<root></root>
Code
const xml = `...`;
const fxp = require('fast-xml-parser');
const parser = new fxp.XMLParser({
ignoreAttributes: false,
});
const result = parser.parse(xml);
console.log(result);
Output
{
'?xml': { '@_version': '1.1', '@_encoding': 'UTF-8' },
root: ''
}
expected data
{
'?xml': { '@_version': '1.1', '@_encoding': 'UTF-8' },
root: '\x1B\x1B'
}
Example in Java, change version="1.1" to "1.0" and you'll see a parser error.
Would you like to work on this issue?
Description
Currently the
<?xml ... ?>prolog declaration is not used by the parser (see #774).This is a problem because certain control characters can only be escaped in XML 1.1 by using numeric character references (NCR). In version 1.0 this is invalid XML.
I can set
htmlEntities: trueto allow parsing NCR, but that also automatically parses other HTML entities which is non-conformant and undesired.I'd also need to manually extract the version first, before parsing the XML string, which I believe is something that the library should do for me.
My proposal:
parseDeclaration(default to true), which reads the XML declaration if it exists, and uses it to automatically set some parser options.htmlEntities: 'ncr'which only enables the NCR parsing. This is what'll be used byparseDeclarationwhen it encounters version 1.1 XML.Input
Code
Output
expected data
Example in Java, change version="1.1" to "1.0" and you'll see a parser error.
Would you like to work on this issue?