Skip to content

HTMLSerializer and HTML entities #197

@danielknell

Description

@danielknell
>>> html5lib.serializer.serialize(html5lib.parse('<p>&nbsp;</p>'))
'<p>\xa0'

at the moment the parsing and serialising a document causes entities to be converted into special characters, including things like #00 and there is no way to pass additional entities to xml.sax.saxutils.escape.

I looked into subclassing the serialiser but the escaping happens in the middle of the serialize() method at:

https://github.com/html5lib/html5lib-python/blob/master/html5lib/serializer/htmlserializer.py#L223

perhaps the class should define an entities dict to pass through the standard html5 entities and special characters or do the escaping via a class method that can be overridden?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions