Skip to content

Redundant schema compilation causes significant resource overhead #175

@uCantHim

Description

@uCantHim

Schema compilations are not de-duplicated, even though scenarios can easily reference the same unique files multiple times. For example, the Validator Configuration for XRechnung compiles 6 unique .xls files a total of 34 times. Both the computational overhead and the memory overhead are significant, as all redundantly compiled documents are being kept in memory.

I have implemented a small fix (~8 lines of code) that caches compiled schemas in ContentRepository. Here are my measurement results for cold starts on the same machine with the default usage example from https://github.com/itplr-kosit/validator-configuration-xrechnung (/usr/bin/time -v for measurement):

Version Time Peak Memory
v1.6.2 ~11s ~800mb
v1.6.2 patched ~7.3s ~400mb

Note that the Saxon documentation explicitly states that XsltExecutable is thread-safe by design. The cache utilizes this property well.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions