Currently, scraper doesn't work very with normal XML documents, specifically - RSS.
There's a few ways to solve this.
-
The included CSS selector engine (https://github.com/PuerkitoBio/goquery) doesn't parse XML properly it seems. We could modify the HTML to make it conform. OR
-
Each configured path could have a mode, where:
html implies selector is a CSS selector (and is the default mode)
xml implies selector is an XPATH selector (eww) OR
xml implies selector is a new format: foo bar, simply traverses into <foo> then into <bar> OR
xml removes all other settings, and simply converts XML into JSON directly (this is probably the easiest)
Currently, scraper doesn't work very with normal XML documents, specifically - RSS.
There's a few ways to solve this.
The included CSS selector engine (https://github.com/PuerkitoBio/goquery) doesn't parse XML properly it seems. We could modify the HTML to make it conform. OR
Each configured path could have a
mode, where:htmlimpliesselectoris a CSS selector (and is the defaultmode)xmlimpliesselectoris an XPATH selector (eww) ORxmlimpliesselectoris a new format:foo bar, simply traverses into<foo>then into<bar>ORxmlremoves all other settings, and simply converts XML into JSON directly (this is probably the easiest)