Skip to content

Url extra escaping #533

@andrewQwer

Description

@andrewQwer

Hi, I'm using HtmlSanitizer for markup sanitizing and after library update from 5.x to 8.x & sign in URLs got escaped.
The problem is that I can't catch where it happens.

I have the following code:

var gj = new HtmlSanitizer
 {
     OutputFormatter = HtmlMarkupFormatter.Instance,
     AllowDataAttributes = true
 };
gj.Sanitize("<img src='http://foobar.com?x=5&y-6'>")

Outputs is: <img src="http://foobar.com?x=5&amp;y-6"> - &amp; appeared.

I tried to do the following:

gj.FilterUrl += (object o, FilterUrlEventArgs e) => {
 Console.WriteLine(e.OriginalUrl); //shows <img src='http://foobar.com?x=5&y-6'>
 Console.WriteLine(e.SanitizedUrl); // shows <img src='http://foobar.com?x=5&y-6'>
}

So in this event both variables are the same, so no chance of fixing it at this stage.

Ok, I tried the following:

gj.PostProcessDom += (sender, args) =>
. {
.     var doc = args.Document;
.     var imgNodes= doc.QuerySelectorAll("img");
.     foreach (var imgNode in imgNodes)
.     {
.         Console.WriteLine("SRC in DOC:" + imgNode.GetAttribute("src")); //shows SRC in DOC: http://foobar.com?x=5&y-6
.     }
. };

So even post process event doesn't have this node escaped. Same is actual for PostProcessNode event.

What can I do else to get back URLs in src/href attributes to it's original unescaped value?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions