diff --git a/Gemfile.lock b/Gemfile.lock index 28e67380..da9cd629 100644 --- a/Gemfile.lock +++ b/Gemfile.lock @@ -3,15 +3,15 @@ PATH specs: sablon (0.0.21) nokogiri (>= 1.6.0) - rubyzip (>= 1.1) + rubyzip (>= 1.1.1) GEM remote: https://rubygems.org/ specs: - mini_portile2 (2.1.0) + mini_portile2 (2.2.0) minitest (5.8.0) - nokogiri (1.7.1) - mini_portile2 (~> 2.1.0) + nokogiri (1.8.0) + mini_portile2 (~> 2.2.0) rake (10.4.2) rubyzip (1.2.1) xml-simple (1.1.5) @@ -27,4 +27,4 @@ DEPENDENCIES xml-simple BUNDLED WITH - 1.14.5 + 1.14.6 diff --git a/README.md b/README.md index 62e12229..9c300958 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,28 @@ and efficient. *Note: Sablon is still in early development. Please report if you encounter any issues along the way.* +#### Table of Contents +* [Installation](#installation) +* [Usage](#usage) + * [Writing Templates](#writing-templates) + * [Content Insertion](#content-insertion) + * [WordProcessingML](#wordprocessingml) + * [HTML](#html) + * [Conditionals](#conditionals) + * [Loops](#loops) + * [Nesting](#nesting) + * [Comments](#comments) + * [Configuration (Beta)](#configuration-beta) + * [Customizing HTML Tag Conversion](#customizing-html-tag-conversion) + * [Customizing CSS Style Conversion](#customizing-css-style-conversion) + * [Executable](#executable) + * [Examples](#examples) + * [Using a Ruby script](#using-a-ruby-script) + * [Using the sablon executable](#using-the-sablon-executable) +* [Contributing](#contributing) +* [Inspiration](#inspiration) + + ## Installation Add this line to your application's Gemfile: @@ -126,14 +148,13 @@ For a complete example see the test file located on `test/image_test.rb`. This functionality was inspired in the [kubido fork](https://github.com/kubido/sablon) for this project - kubido/sablon -##### HTML [experimental] +##### HTML -Similar to WordProcessingML it's possible to use html as input while processing the -tempalte. You don't need to modify your templates, a simple insertion operation +Similar to WordProcessingML it's possible to use html as input while processing the template. You don't need to modify your templates, a simple insertion operation is sufficient: ``` -«=article.body» +«=article» ``` To use HTML insertion prepare the context like so: @@ -142,24 +163,40 @@ To use HTML insertion prepare the context like so: html_body = <<-HTML
This text can contain additional formatting according to the HTML specification.
+

Right aligned +content with a yellow background color

+
Inline styles are possible as well
HTML context = { - article: { html_body: Sablon.content(:html, html_body) } + article: Sablon.content(:html, html_body) } + # alternative method using special key format + # 'html:article' => html_body } template.render_to_file File.expand_path("~/Desktop/output.docx"), context ``` -Currently HTML insertion is very limited and strongly focused on the HTML -generated by [Trix editor](https://github.com/basecamp/trix). +Currently, HTML insertion is somewhat limited. It is recommended that the block level tags such as `p` and `div` are not nested within each other, otherwise the final document may not generate as anticipated. List tags (`ul` and `ol`) and inline tags (`span`, `b`, `em`, etc.) can be nested as deeply as needed. -IMPORTANT: This feature is very much *experimental*. Currently, the insertion - will replace the containing paragraph. This means that other content in the same - paragraph is discarded. +Not all tags are supported. Currently supported tags are defined in [configuration.rb](lib/sablon/configuration/configuration.rb) for paragraphs in method `prepare_paragraph` and for text runs in `prepare_run`. + +Basic conversion of CSS inline styles into matching WordML properties in supported through the `style=" ... "` attribute in the HTML markup. Not all possible styles are supported and only a small subset of CSS styles have a direct WordML equivalent. Styles are passed onto nested elements. The currently supported styles are also defined in [configuration.rb](lib/sablon/configuration/configuration.rb) in method `process_style`. Simple toggle properties that aren't directly supported can be added using the `text-decoration: ` style attribute with the proper WordML tag name as the value. Paragraph and Run property reference can be found at: + * http://officeopenxml.com/WPparagraphProperties.php + * http://officeopenxml.com/WPtextFormatting.php + +If you wish to write out your HTML code in an indented human readable fashion, or you are pulling content from the ERB templating engine in rails the following regular expression can help eliminate extraneous whitespace in the final document. +```ruby +# combine all white space +html_str = html_str.gsub(/\s+/, ' ') +# clear any white space between block level tags and other content +html_str.gsub(%r{\s*<(/?(?:h\d|div|p|br|ul|ol|li).*?)>\s*}, '<\1>') +``` + +IMPORTANT: Currently, the insertion will replace the containing paragraph. This means that other content in the same paragraph is discarded. #### Conditionals -Sablon can render parts of the template conditonally based on the value of a +Sablon can render parts of the template conditionally based on the value of a context variable. Conditional fields are inserted around the content. ``` @@ -213,6 +250,78 @@ styles for HTML insertion. «endComment» ``` +### Configuration (Beta) + +The Sablon::Configuration singleton is a new feature that allows the end user to customize HTML parsing to their needs without needing to fork and edit the source code of the gem. This API is still in a beta state and may be subject to change as future needs are identified beyond HTML conversion. + +The example below show how to expose the configuration instance: +```ruby +Sablon.configure do |config| + # manipulate config object +end +``` + +The default set of registered HTML tags and CSS property conversions are defined in [configuration.rb](lib/sablon/configuration/configuration.rb). + +#### Customizing HTML Tag Conversion + +Any HTML tag can be added using the configuration object even if it needs a custom AST class to handle conversion logic. Simple inline tags that only modify the style of text (i.e. the already supported `` tag) can be added without an AST class as shown below: +```ruby +Sablon.configure do |config| + config.register_html_tag(:bgcyan, :inline, properties: { highlight: 'cyan' }) +end +``` +The above tag simply adds a background color to text using the `` property. + + +More complex business logic can be supported by adding a new class under the `Sablon::HTMLConverter` namespace. The new class will likely subclass `Sablon::HTMLConverter::Node` or `Sablon::HTMLConverter::Collection` depending on the needed behavior. The current AST classes serve as additional examples and can be found in [ast.rb](/lib/sablon/html/ast.rb). When registering a new HTML tag that uses a custom AST class the class must be passed in either by name using a lowercased and underscored symbol or the class object itself. + +The block below shows how to register a new HTML tag that adds the following AST class: `Sablon::HTMLConverter::InstrText`. +```ruby +module Sablon + class HTMLConverter + class InstrText < Node + # implementation details ... + end + end +end +# register tag +Sablon.configure do |config| + config.register_html_tag(:bgcyan, :inline, ast_class: :instr_text) +end +``` + +Existing tags can be overwritten using the `config.register_html_tag` method or removed entirely using `config.remove_html_tag`. +```ruby +# remove tag +Sablon.configure do |config| + # remove support for the span tag + config.remove_html_tag(:span) +end +``` + + +#### Customizing CSS Style Conversion + +The conversion of CSS stored in an element's `style="..."` attribute can be customized using the configuration object as well. Adding a new style conversion or overriding an existing one is done using the `config.register_style_converter` method. It accepts three arguments the name of the AST node (as a lowercased and underscored symbol) the style applies to, the name of the CSS property (needs to be a string in most cases) and a lambda that accepts a single argument, the property value. The example below shows how to add a new style that sets the `` property. +```ruby +# add style conversion +Sablon.configure do |config| + # register new conversion for the Sablon::HTMLConverter::Run AST class. + converter = lambda { |v| return 'highlight', v } + config.register_style_converter(:run, 'custom-highlight', converter) +end +``` + +Existing conversions can be overwritten using the `config.register_style_converter` method or removed entirely using `config.remove_style_converter`. +```ruby +# remove tag +Sablon.configure do |config| + # remove support for conversion of font-size for the Run AST class + config.remove_style_converter(:run, 'font-size') +end +``` + ### Executable The `sablon` executable can be used to process templates on the command-line. diff --git a/lib/sablon.rb b/lib/sablon.rb index fc5464f3..c4db06f2 100644 --- a/lib/sablon.rb +++ b/lib/sablon.rb @@ -1,7 +1,10 @@ require 'zip' require 'nokogiri' +require 'open-uri' require "sablon/version" +require "sablon/configuration/configuration" + require "sablon/numbering" require "sablon/images" require "sablon/context" @@ -21,6 +24,10 @@ module Sablon class TemplateError < ArgumentError; end class ContextError < ArgumentError; end + def self.configure + yield(Configuration.instance) if block_given? + end + def self.template(path) Template.new(path) end diff --git a/lib/sablon/configuration/configuration.rb b/lib/sablon/configuration/configuration.rb new file mode 100644 index 00000000..3df2f331 --- /dev/null +++ b/lib/sablon/configuration/configuration.rb @@ -0,0 +1,165 @@ +require 'singleton' +require 'sablon/configuration/html_tag' + +module Sablon + # Handles storing configuration data for the sablon module + class Configuration + include Singleton + + attr_accessor :permitted_html_tags, :defined_style_conversions + + def initialize + initialize_html_tags + initialize_css_style_conversion + end + + # Adds a new tag to the permitted tags hash or replaces an existing one + def register_html_tag(tag_name, type = :inline, **options) + tag = HTMLTag.new(tag_name, type, **options) + @permitted_html_tags[tag.name] = tag + end + + # Removes a tag from the permitted tgs hash, returning it + def remove_html_tag(tag_name) + @permitted_html_tags.delete(tag_name) + end + + # Adds a new style property converter for the specified ast class and + # CSS property name. The ast_class variable should be the class name + # in lowercased snakecase as a symbol, i.e. MyClass -> :my_class. + # The converter passed in must be a proc that accepts + # a single argument (the value) and returns two values: the WordML property + # name and its value. The converted property value can be a string, hash + # or array. + def register_style_converter(ast_node, prop_name, converter) + # create a new ast node hash if needed + unless @defined_style_conversions[ast_node] + @defined_style_conversions[ast_node] = {} + end + # add the style converter to the node's hash + @defined_style_conversions[ast_node][prop_name] = converter + end + + # Deletes a CSS converter from the hash by specifying the AST class + # in lowercased snake case and the property name. + def remove_style_converter(ast_node, prop_name) + @defined_style_conversions[ast_node].delete(prop_name) + end + + private + + # Defines all of the initial HTML tags to be used by HTMLconverter + def initialize_html_tags + @permitted_html_tags = {} + tags = { + # special tag used for elements with no parent, i.e. top level + '#document-fragment' => { type: :block, ast_class: :root, allowed_children: :_block }, + + # block level tags + div: { type: :block, ast_class: :paragraph, properties: { pStyle: 'Normal' }, allowed_children: :_inline }, + p: { type: :block, ast_class: :paragraph, properties: { pStyle: 'Paragraph' }, allowed_children: :_inline }, + h1: { type: :block, ast_class: :paragraph, properties: { pStyle: 'Heading1' }, allowed_children: :_inline }, + h2: { type: :block, ast_class: :paragraph, properties: { pStyle: 'Heading2' }, allowed_children: :_inline }, + h3: { type: :block, ast_class: :paragraph, properties: { pStyle: 'Heading3' }, allowed_children: :_inline }, + h4: { type: :block, ast_class: :paragraph, properties: { pStyle: 'Heading4' }, allowed_children: :_inline }, + h5: { type: :block, ast_class: :paragraph, properties: { pStyle: 'Heading5' }, allowed_children: :_inline }, + h6: { type: :block, ast_class: :paragraph, properties: { pStyle: 'Heading6' }, allowed_children: :_inline }, + ol: { type: :block, ast_class: :list, properties: { pStyle: 'ListNumber' }, allowed_children: %i[ol li] }, + ul: { type: :block, ast_class: :list, properties: { pStyle: 'ListBullet' }, allowed_children: %i[ul li] }, + li: { type: :block, ast_class: :list_paragraph }, + + # inline style tags + span: { type: :inline, ast_class: nil, properties: {} }, + strong: { type: :inline, ast_class: nil, properties: { b: nil } }, + b: { type: :inline, ast_class: nil, properties: { b: nil } }, + em: { type: :inline, ast_class: nil, properties: { i: nil } }, + i: { type: :inline, ast_class: nil, properties: { i: nil } }, + u: { type: :inline, ast_class: nil, properties: { u: 'single' } }, + s: { type: :inline, ast_class: nil, properties: { strike: 'true' } }, + sub: { type: :inline, ast_class: nil, properties: { vertAlign: 'subscript' } }, + sup: { type: :inline, ast_class: nil, properties: { vertAlign: 'superscript' } }, + + # inline content tags + text: { type: :inline, ast_class: :run, properties: {}, allowed_children: [] }, + br: { type: :inline, ast_class: :newline, properties: {}, allowed_children: [] } + } + # add all tags to the config object + tags.each do |tag_name, settings| + type = settings.delete(:type) + register_html_tag(tag_name, type, **settings) + end + end + + # Defines an initial set of CSS -> WordML conversion lambdas stored in + # a nested hash structure where the first key is the AST class and the + # second is the conversion lambda + def initialize_css_style_conversion + @defined_style_conversions = { + # styles shared or common logic across all node types go here. + # Special conversion lambdas such as :_border can be + # defined here for reuse across several AST nodes. Care must + # be taken to avoid possible naming conflicts, hence the underscore. + # AST class keys should be stored with their names converted from + # camelcase to lowercased snakecase, i.e. TestCase = test_case + node: { + 'background-color' => lambda { |v| + return 'shd', { val: 'clear', fill: v.delete('#') } + }, + _border: lambda { |v| + props = { sz: 2, val: 'single', color: '000000' } + vals = v.split + vals[1] = 'single' if vals[1] == 'solid' + # + props[:sz] = @defined_style_conversions[:node][:_sz].call(vals[0]) + props[:val] = vals[1] if vals[1] + props[:color] = vals[2].delete('#') if vals[2] + # + return props + }, + _sz: lambda { |v| + return nil unless v + (2 * Float(v.gsub(/[^\d.]/, '')).ceil).to_s + }, + 'text-align' => ->(v) { return 'jc', v } + }, + # Styles specific to the Paragraph AST class + paragraph: { + 'border' => lambda { |v| + props = @defined_style_conversions[:node][:_border].call(v) + # + return 'pBdr', [ + { top: props }, { bottom: props }, + { left: props }, { right: props } + ] + }, + 'vertical-align' => ->(v) { return 'textAlignment', v } + }, + # Styles specific to a run of text + run: { + 'color' => ->(v) { return 'color', v.delete('#') }, + 'font-size' => lambda { |v| + return 'sz', @defined_style_conversions[:node][:_sz].call(v) + }, + 'font-style' => lambda { |v| + return 'b', nil if v =~ /bold/ + return 'i', nil if v =~ /italic/ + }, + 'font-weight' => ->(v) { return 'b', nil if v =~ /bold/ }, + 'text-decoration' => lambda { |v| + supported = %w[line-through underline] + props = v.split + return props[0], 'true' unless supported.include? props[0] + return 'strike', 'true' if props[0] == 'line-through' + return 'u', 'single' if props.length == 1 + return 'u', { val: props[1], color: 'auto' } if props.length == 2 + return 'u', { val: props[1], color: props[2].delete('#') } + }, + 'vertical-align' => lambda { |v| + return 'vertAlign', 'subscript' if v =~ /sub/ + return 'vertAlign', 'superscript' if v =~ /super/ + } + } + } + end + end +end diff --git a/lib/sablon/configuration/html_tag.rb b/lib/sablon/configuration/html_tag.rb new file mode 100644 index 00000000..86f610f6 --- /dev/null +++ b/lib/sablon/configuration/html_tag.rb @@ -0,0 +1,97 @@ +module Sablon + class Configuration + # Stores the information for a single HTML tag. This information + # is used by the HTMLConverter. An optional AST class can be defined, + # and if so conversion stops there and it is assumed the AST class + # will handle any child nodes unless the element is a block level tag. + # In the case of a block level tag the child nodes are processed by the + # AST builder again. If the AST class is omitted it is assumed the node + # should be "passed through" only transferring it's properties onto + # children. A block level tag must have an AST class associated with + # it. The block and inline status of tags is not affected by CSS. + # Permitted child tags are specified using the :allowed_children optional + # arg. The default value is [:_inline, :ul, :ol]. :_inline is a special + # reference to all inline type tags, :_block is equivalent for block + # type tags. + # + # == Parameters + # * name - symbol or string of the HTML element tag name + # * type - The type of HTML tag needs to be :inline or :block + # * ast_class - class instance or symbol, the AST class or it's name + # used to process the HTML node + # * options - collects all other keyword arguments, Current kwargs are + # `:properties`, `:attributes` and `:allowed_children`. + # + # Example + # HTMLTag.new(:div, :block, ast_class: Sablon::HTMLConverter::Paragraph, + # properties: { pStyle: 'Normal' }) + class HTMLTag + attr_reader :name, :type, :ast_class, :attributes, :properties, + :allowed_children + + # Setup HTML tag information + def initialize(name, type, ast_class: nil, **options) + # Set basic params converting some args to symbols for consistency + @name = name.to_sym + @type = type.to_sym + self.ast_class = ast_class if ast_class + + # Ensure block level tags have an AST class + if @type == :block && @ast_class.nil? + raise ArgumentError, "Block level tag #{name} must have an AST class." + end + + # Set attributes from optinos hash, currently unused during AST generation + @attributes = options.fetch(:attributes, {}) + # WordML properties defined by the tag, i.e. for the tag, + # etc. All the keys need to be symbols to avoid getting reparsed + # with the element's CSS attributes. + @properties = options.fetch(:properties, {}) + @properties = Hash[@properties.map { |k, v| [k.to_sym, v] }] + # Set permitted child tags or tag groups + self.allowed_children = options[:allowed_children] + end + + # checks if the given tag is a permitted child element + def allowed_child?(tag) + if @allowed_children.include?(tag.name) + true + elsif @allowed_children.include?(:_inline) && tag.type == :inline + true + elsif @allowed_children.include?(:_block) && tag.type == :block + true + else + false + end + end + + private + + def allowed_children=(value) + if value.nil? + @allowed_children = %i[_inline ol ul] + return + else + value = [value] unless value.is_a? Array + end + @allowed_children = value.map(&:to_sym) + end + + # converts a string or symbol to a class defined under + # Sablon::HTMLConverter + def ast_class=(value) + if value.is_a? Class + @ast_class = value + return + else + value = value.to_s + end + # camel case the word and get class, similar logic to + # ActiveSupport::Inflector.constantize but refactored to be specific + # to the HTMLConverter class + value.gsub!(/(?:^|_)([a-z])/) { Regexp.last_match[1].capitalize } + @ast_class = Sablon::HTMLConverter.const_get(value) + end + end + end +end diff --git a/lib/sablon/content.rb b/lib/sablon/content.rb index bdec9b57..eff2aaf1 100644 --- a/lib/sablon/content.rb +++ b/lib/sablon/content.rb @@ -50,7 +50,8 @@ def inspect end def initialize(path) - super "#{Integer(rand * 1e9)}-#{File.basename(path)}", IO.binread(path) + # Links from Amazon S3 might have ?1498548740 part + super "#{Integer(rand * 1e9)}-#{File.basename(path).split('?').first}", open(path).read end def append_to(paragraph, display_node, env) end diff --git a/lib/sablon/html/ast.rb b/lib/sablon/html/ast.rb index 6327da91..13ab94b6 100644 --- a/lib/sablon/html/ast.rb +++ b/lib/sablon/html/ast.rb @@ -1,15 +1,171 @@ +require "sablon/html/ast_builder" + module Sablon class HTMLConverter + # A top level abstract class to handle common logic for all AST nodes class Node + PROPERTIES = [].freeze + + def self.node_name + @node_name ||= name.split('::').last + end + + # Returns a hash defined on the configuration object by default. However, + # this method can be overridden by subclasses to return a different + # node's style conversion config (i.e. :run) or a hash unrelated to the + # config itself. The config object is used for all built-in classes to + # allow for end-user customization via the configuration object + def self.style_conversion + # converts camelcase to underscored + key = node_name.gsub(/([a-z])([A-Z])/, '\1_\2').downcase.to_sym + Sablon::Configuration.instance.defined_style_conversions.fetch(key, {}) + end + + # maps the CSS style property to it's OpenXML equivalent. Not all CSS + # properties have an equivalent, nor share the same behavior when + # defined on different node types (Paragraph, Table and Run). + def self.process_properties(properties) + # process the styles as a hash and store values + style_attrs = {} + properties.each do |key, value| + unless key.is_a? Symbol + key, value = *convert_style_property(key.strip, value.strip) + end + style_attrs[key] = value if key + end + style_attrs + end + + # handles conversion of a single attribute allowing recursion through + # super classes. If the key exists and conversion is succesful a + # symbol is returned to avoid conflicts with a CSS prop sharing the + # same name. Keys without a conversion class are returned as is + def self.convert_style_property(key, value) + if style_conversion.key?(key) + key, value = style_conversion[key].call(value) + key = key.to_sym if key + [key, value] + elsif self == Node + [key, value] + else + superclass.convert_style_property(key, value) + end + end + + def initialize(_env, _node, _properties) + @attributes ||= {} + end + def accept(visitor) visitor.visit(self) end - def self.node_name - @node_name ||= name.split('::').last + # Simplifies usage at call sites by only requiring them to supply + # the tag name to use and any child AST nodes to render + def to_docx(tag) + prop_str = @properties.to_docx if @properties + # + "<#{tag}#{attributes_to_docx}>#{prop_str}#{children_to_docx}" + end + + private + + # Simplifies usage at call sites + def transferred_properties + @properties.transferred_properties + end + + # Gracefully handles conversion of an attributes hash into a + # string + def attributes_to_docx + return '' if @attributes.nil? || @attributes.empty? + ' ' + @attributes.map { |k, v| %(#{k}="#{v}") }.join(' ') + end + + # Acts like an abstract method allowing subclases full flexibility to + # define any content inside the tags. + def children_to_docx + '' end end + # Manages the properties for an AST node + class NodeProperties + attr_reader :transferred_properties + + def self.paragraph(properties) + new('w:pPr', properties, Paragraph::PROPERTIES) + end + + def self.run(properties) + new('w:rPr', properties, Run::PROPERTIES) + end + + def initialize(tagname, properties, whitelist) + @tagname = tagname + filter_properties(properties, whitelist) + end + + def inspect + @properties.map { |k, v| v ? "#{k}=#{v}" : k }.join(';') + end + + def [](key) + @properties[key] + end + + def []=(key, value) + @properties[key] = value + end + + def to_docx + "<#{@tagname}>#{process}" unless @properties.empty? + end + + private + + # processes properties adding those on the whitelist to the + # properties instance variable and those not to the transferred_properties + # isntance variable + def filter_properties(properties, whitelist) + @transferred_properties = {} + @properties = {} + # + properties.each do |key, value| + if whitelist.include? key.to_s + @properties[key] = value + else + @transferred_properties[key] = value + end + end + end + + # processes attributes defined on the node into wordML property syntax + def process + @properties.map { |k, v| transform_attr(k, v) }.join + end + + # properties that have a list as the value get nested in tags and + # each entry in the list is transformed. When a value is a hash the + # keys in the hash are used to explicitly build the XML tag attributes. + def transform_attr(key, value) + if value.is_a? Array + sub_attrs = value.map do |sub_prop| + sub_prop.map { |k, v| transform_attr(k, v) } + end + "#{sub_attrs.join}" + elsif value.is_a? Hash + props = value.map { |k, v| format('w:%s="%s"', k, v) if v } + "" + else + value = format('w:val="%s" ', value) if value + "" + end + end + end + + # A container for an array of AST nodes with convenience methods to + # work with the internal array as if it were a regular node class Collection < Node attr_reader :nodes def initialize(nodes) @@ -32,7 +188,18 @@ def inspect end end + # Stores all of the AST nodes from the current fragment of HTML being + # parsed class Root < Collection + def initialize(env, node) + # strip text nodes from the root level element, these are typically + # extra whitespace from indenting the markup + node.search('./text()').remove + + # convert children from HTML to AST nodes + super(ASTBuilder.html_to_ast(env, node.children, {})) + end + def grep(pattern) visitor = GrepVisitor.new(pattern) accept(visitor) @@ -44,24 +211,26 @@ def inspect end end + # An AST node representing the top level content container for a word + # document. These cannot be nested within other paragraph elements class Paragraph < Node - attr_accessor :style, :runs - def initialize(style, runs) - @style, @runs = style, runs - end + PROPERTIES = %w[framePr ind jc keepLines keepNext numPr + outlineLvl pBdr pStyle rPr sectPr shd spacing + tabs textAlignment].freeze + attr_accessor :runs - PATTERN = <<-XML.gsub("\n", "") - - - -%s - -%s - -XML + def initialize(env, node, properties) + super + properties = self.class.process_properties(properties) + @properties = NodeProperties.paragraph(properties) + # + trans_props = transferred_properties + @runs = ASTBuilder.html_to_ast(env, node.children, trans_props) + @runs = Collection.new(@runs) + end def to_docx - PATTERN % [style, ppr_docx, runs.to_docx] + super('w:p') end def accept(visitor) @@ -70,107 +239,146 @@ def accept(visitor) end def inspect - "" + "" end private - def ppr_docx + + def children_to_docx + runs.to_docx end end - class ListParagraph < Paragraph - LIST_STYLE = <<-XML.gsub("\n", "") - - - - -XML - attr_accessor :numid, :ilvl - def initialize(style, runs, numid, ilvl) - super style, runs - @numid = numid - @ilvl = ilvl - end + # Manages the child nodes of a list type tag + class List < Collection + def initialize(env, node, properties) + # intialize values + @list_tag = node.name + # + if node.ancestors(".//#{@list_tag}").length.zero? + # Only register a definition when upon the first list tag encountered + @definition = env.numbering.register(properties[:pStyle]) + end - private - def ppr_docx - LIST_STYLE % [@ilvl, numid] - end - end + # update attributes of all child nodes + transfer_node_attributes(node.children, node.attributes) + + # Move any list tags that are a child of a list item up one level + process_child_nodes(node) - class TextFormat - def initialize(bold, italic, underline) - @bold = bold - @italic = italic - @underline = underline + # strip text nodes from the list level element, this is typically + # extra whitespace from indenting the markup + node.search('./text()').remove + + # convert children from HTML to AST nodes + super(ASTBuilder.html_to_ast(env, node.children, properties)) end def inspect - parts = [] - parts << 'bold' if @bold - parts << 'italic' if @italic - parts << 'underline' if @underline - parts.join('|') + "" end - def to_docx - styles = [] - styles << '' if @bold - styles << '' if @italic - styles << '' if @underline - if styles.any? - "#{styles.join}" - else - '' + private + + # handles passing all attributes on the parent down to children + def transfer_node_attributes(nodes, attributes) + nodes.each do |child| + # update all attributes + merge_attributes(child, attributes) + + # set attributes specific to list items + if @definition + child['pStyle'] = @definition.style + child['numId'] = @definition.numid + end + child['ilvl'] = child.ancestors(".//#{@list_tag}").length - 1 end end - def self.default - @default ||= new(false, false, false) + # merges parent and child attributes together, preappending the parent's + # values to allow the child node to override it if the value is already + # defined on the child node. + def merge_attributes(child, parent_attributes) + parent_attributes.each do |name, par_attr| + child_attr = child[name] ? child[name].split(';') : [] + child[name] = par_attr.value.split(';').concat(child_attr).join('; ') + end end - def with_bold - TextFormat.new(true, @italic, @underline) + # moves any list tags that are a child of a list item tag up one level + # so they become a sibling instead of a child + def process_child_nodes(node) + node.xpath("./li/#{@list_tag}").each do |list| + # transfer attributes from parent now because the list tag will + # no longer be a child and won't inheirit them as usual + transfer_node_attributes(list.children, list.parent.attributes) + list.parent.add_next_sibling(list) + end end + end - def with_italic - TextFormat.new(@bold, true, @underline) + # Sets list item specific attributes registered on the node to properly + # generate a list paragraph + class ListParagraph < Paragraph + def initialize(env, node, properties) + list_props = { + pStyle: node['pStyle'], + numPr: [{ ilvl: node['ilvl'] }, { numId: node['numId'] }] + } + properties = properties.merge(list_props) + super end - def with_underline - TextFormat.new(@bold, @italic, true) + private + + def transferred_properties + super end end - class Text < Node - attr_reader :string - def initialize(string, format) - @string = string - @format = format + # Create a run of text in the document, runs cannot be nested within + # each other + class Run < Node + PROPERTIES = %w[b i caps color dstrike emboss imprint highlight outline + rStyle shadow shd smallCaps strike sz u vanish + vertAlign].freeze + + def initialize(_env, node, properties) + super + properties = self.class.process_properties(properties) + @properties = NodeProperties.run(properties) + @string = node.text end def to_docx - "#{@format.to_docx}#{normalized_string}" + super('w:r') end def inspect - "" + "" end private - def normalized_string - string.tr("\u00A0", ' ') + + def children_to_docx + content = @string.tr("\u00A0", ' ') + "#{content}" end end - class Newline < Node - def to_docx - "" - end + # Creates a blank line in the word document + class Newline < Run + def initialize(*); end def inspect "" end + + private + + def children_to_docx + "" + end end end end diff --git a/lib/sablon/html/ast_builder.rb b/lib/sablon/html/ast_builder.rb new file mode 100644 index 00000000..a0e4130e --- /dev/null +++ b/lib/sablon/html/ast_builder.rb @@ -0,0 +1,90 @@ +module Sablon + class HTMLConverter + # Converts a nokogiri HTML fragment into an equivalent AST structure + class ASTBuilder + attr_reader :nodes + + def self.html_to_ast(env, nodes, properties) + builder = new(env, nodes, properties) + builder.nodes + end + + private + + def initialize(env, nodes, properties) + @env = env + @nodes = process_nodes(nodes, properties).compact + end + + # Loops over HTML nodes converting them to their configured AST class + def process_nodes(html_nodes, properties) + html_nodes.flat_map do |node| + # get tags from config + parent_tag = fetch_tag(node.parent.name) if node.parent.name + tag = fetch_tag(node.name) + + # remove all text nodes if the tag doesn't accept them + node.search('./text()').remove if drop_text?(tag) + + # check node hierarchy + validate_structure(parent_tag, tag) + + # merge properties + local_props = merge_node_properties(node, tag, properties) + if tag.ast_class + tag.ast_class.new(@env, node, local_props) + else + process_nodes(node.children, local_props) + end + end + end + + # retrieves a HTMLTag instance from the cpermitted_html_tags hash or + # raises an ArgumentError if the tag is not registered in the hash + def fetch_tag(tag_name) + tag_name = tag_name.to_sym + unless Sablon::Configuration.instance.permitted_html_tags[tag_name] + raise ArgumentError, "Don't know how to handle HTML tag: #{tag_name}" + end + Sablon::Configuration.instance.permitted_html_tags[tag_name] + end + + # Checking that the current tag is an allowed child of the parent_tag. + # If the parent tag is nil then a block level tag is required. + def validate_structure(parent, child) + if parent.ast_class == Root && child.type == :inline + msg = "#{child.name} needs to be wrapped in a block level tag." + elsif parent && !parent.allowed_child?(child) + msg = "#{child.name} is not a valid child element of #{parent.name}." + else + return + end + raise ContextError, "Invalid HTML structure: #{msg}" + end + + # If the node doesn't allow inline elements, or text specifically, + # drop all text nodes. This is largely meant to prevent whitespace + # between tags from rasing an invalid structure error. Although it + # will purge the node whether it contains nonblank characters or not. + def drop_text?(child) + text = fetch_tag(:text) + !child.allowed_child?(text) + end + + # Merges node properties in a sppecifc + def merge_node_properties(node, tag, parent_properties) + # Process any styles, defined on the node into a hash + if node['style'] + style_props = node['style'].split(';').map do |prop| + prop.split(':').map(&:strip) + end + style_props = Hash[style_props] + else + style_props = {} + end + # allow inline styles to override parent styles passed down + parent_properties.merge(tag.properties).merge(style_props) + end + end + end +end diff --git a/lib/sablon/html/converter.rb b/lib/sablon/html/converter.rb index 05cc48b0..7e5d6f05 100644 --- a/lib/sablon/html/converter.rb +++ b/lib/sablon/html/converter.rb @@ -3,69 +3,8 @@ module Sablon class HTMLConverter - class ASTBuilder - Layer = Struct.new(:items, :ilvl) - - def initialize(nodes) - @layers = [Layer.new(nodes, false)] - @root = Root.new([]) - end - - def to_ast - @root - end - - def new_layer(ilvl: false) - @layers.push Layer.new([], ilvl) - end - - def next - current_layer.items.shift - end - - def push(node) - @layers.last.items.push node - end - - def push_all(nodes) - nodes.each(&method(:push)) - end - - def done? - !current_layer.items.any? - end - - def nested? - ilvl > 0 - end - - def ilvl - @layers.select { |layer| layer.ilvl }.size - 1 - end - - def emit(node) - @root.nodes << node - end - - private - - def current_layer - if @layers.any? - last_layer = @layers.last - if last_layer.items.any? - last_layer - else - @layers.pop - current_layer - end - else - Layer.new([], false) - end - end - end - def process(input, env) - @numbering = env.numbering + @env = env processed_ast(input).to_docx end @@ -77,75 +16,7 @@ def processed_ast(input) def build_ast(input) doc = Nokogiri::HTML.fragment(input) - @builder = ASTBuilder.new(doc.children) - - while !@builder.done? - ast_next_paragraph - end - @builder.to_ast - end - - private - - def initialize - @numbering = nil - end - - def ast_next_paragraph - node = @builder.next - if node.name == 'div' - @builder.new_layer - @builder.emit Paragraph.new('Normal', ast_text(node.children)) - elsif node.name == 'p' - @builder.new_layer - @builder.emit Paragraph.new('Paragraph', ast_text(node.children)) - elsif node.name =~ /h(\d+)/ - @builder.new_layer - @builder.emit Paragraph.new("Heading#{$1}", ast_text(node.children)) - elsif node.name == 'ul' - @builder.new_layer ilvl: true - unless @builder.nested? - @definition = @numbering.register('ListBullet') - end - @builder.push_all(node.children) - elsif node.name == 'ol' - @builder.new_layer ilvl: true - unless @builder.nested? - @definition = @numbering.register('ListNumber') - end - @builder.push_all(node.children) - elsif node.name == 'li' - @builder.new_layer - @builder.emit ListParagraph.new(@definition.style, ast_text(node.children), @definition.numid, @builder.ilvl) - elsif node.text? - # SKIP? - else - raise ArgumentError, "Don't know how to handle node: #{node.inspect}" - end - end - - def ast_text(nodes, format: TextFormat.default) - runs = nodes.flat_map do |node| - if node.text? - Text.new(node.text, format) - elsif node.name == 'br' - Newline.new - elsif node.name == 'span' - ast_text(node.children).nodes - elsif node.name == 'strong' || node.name == 'b' - ast_text(node.children, format: format.with_bold).nodes - elsif node.name == 'em' || node.name == 'i' - ast_text(node.children, format: format.with_italic).nodes - elsif node.name == 'u' - ast_text(node.children, format: format.with_underline).nodes - elsif ['ul', 'ol', 'p', 'div'].include?(node.name) - @builder.push(node) - nil - else - raise ArgumentError, "Don't know how to handle node: #{node.inspect}" - end - end - Collection.new(runs.compact) + Root.new(@env, doc) end end end diff --git a/lib/sablon/processor/document.rb b/lib/sablon/processor/document.rb index 3d5e8fc4..91f56eac 100644 --- a/lib/sablon/processor/document.rb +++ b/lib/sablon/processor/document.rb @@ -118,8 +118,8 @@ def self.encloses?(start_field, end_field) end class ImageBlock < ParagraphBlock - def self.parent(node) - node.ancestors + def self.placeholder(node) + parent(node).xpath('following-sibling::w:p') end def self.encloses?(start_field, end_field) @@ -133,9 +133,9 @@ def replace(content) return end - pic_prop = self.class.parent(start_field).at_xpath('.//pic:cNvPr', pic: Sablon::Processor::Relationships::PICTURE_NS_URI) + pic_prop = self.class.placeholder(start_field).at_xpath('.//pic:cNvPr', pic: Sablon::Processor::Relationships::PICTURE_NS_URI) pic_prop.attributes['name'].value = content.first.name - blip = self.class.parent(start_field).at_xpath('.//a:blip', a: Sablon::Processor::Relationships::MAIN_NS_URI) + blip = self.class.placeholder(start_field).at_xpath('.//a:blip', a: Sablon::Processor::Relationships::MAIN_NS_URI) blip.attributes['embed'].value = content.first.rid start_field.remove end_field.remove @@ -196,7 +196,7 @@ def consume(allow_insertion) when /([^ ]+):if/ block = consume_block("#{$1}:endIf") Statement::Condition.new(Expression.parse($1), block) - when /comment/ + when /^comment$/ block = consume_block("endComment") Statement::Comment.new(block) when /^@([^ ]+):start/ diff --git a/sablon.gemspec b/sablon.gemspec index b5fc8532..0a26932b 100644 --- a/sablon.gemspec +++ b/sablon.gemspec @@ -20,7 +20,7 @@ Gem::Specification.new do |spec| spec.require_paths = ["lib"] spec.add_runtime_dependency 'nokogiri', ">= 1.6.0" - spec.add_runtime_dependency 'rubyzip', ">= 1.1" + spec.add_runtime_dependency 'rubyzip', ">= 1.1.1" spec.add_development_dependency "bundler", ">= 1.6" spec.add_development_dependency "rake", "~> 10.0" diff --git a/test/configuration_test.rb b/test/configuration_test.rb new file mode 100644 index 00000000..d62a659d --- /dev/null +++ b/test/configuration_test.rb @@ -0,0 +1,122 @@ +# -*- coding: utf-8 -*- +require "test_helper" + +class ConfigurationTest < Sablon::TestCase + def setup + super + @config = Sablon::Configuration.send(:new) + end + + def test_register_tag + options = { + ast_class: :paragraph, + attributes: { dummy: 'value' }, + properties: { pstyle: 'ListBullet' }, + allowed_children: %i[_inline ol ul li] + } + # test initialization without type + tag = @config.register_html_tag(:test_tag, **options) + assert_equal @config.permitted_html_tags[:test_tag], tag + assert_equal tag.name, :test_tag + assert_equal tag.type, :inline + assert_equal tag.ast_class, Sablon::HTMLConverter::Paragraph + assert_equal tag.attributes, dummy: 'value' + assert_equal tag.properties, pstyle: 'ListBullet' + assert_equal tag.allowed_children, %i[_inline ol ul li] + + # test initialization with type + tag = @config.register_html_tag('test_tag2', :block, **options) + assert_equal @config.permitted_html_tags[:test_tag2], tag + assert_equal tag.name, :test_tag2 + assert_equal tag.type, :block + end + + def test_remove_tag + tag = @config.register_html_tag(:test) + assert_equal @config.remove_html_tag(:test), tag + assert_nil @config.permitted_html_tags[:test] + end + + def test_register_style_converter_on_existing_ast_class + converter = ->(v) { return "test-attr-#{v}" } + @config.register_style_converter(:run, 'my-test-attr', converter) + # + assert @config.defined_style_conversions[:run]['my-test-attr'], 'converter should be stored in hash' + assert_equal 'test-attr-123', @config.defined_style_conversions[:run]['my-test-attr'].call(123) + end + + def test_register_style_converter_on_newast_class + converter = ->(v) { return "test-attr-#{v}" } + @config.register_style_converter(:unset_ast_class, 'my-test-attr', converter) + # + assert @config.defined_style_conversions[:unset_ast_class]['my-test-attr'], 'converter should be stored in hash' + end + + def test_remove_style_converter + converter = ->(v) { return "test-attr-#{v}" } + converter = @config.register_style_converter(:run, 'my-test-attr', converter) + # + assert_equal converter, @config.remove_style_converter(:run, 'my-test-attr') + assert_nil @config.defined_style_conversions[:run]['my-test-attr'] + end +end + +class ConfigurationHTMLTagTest < Sablon::TestCase + # test basic instantiation of an HTMLTag + def test_html_tag_defaults + tag = Sablon::Configuration::HTMLTag.new(:a, :inline) + assert_equal tag.name, :a + assert_equal tag.type, :inline + assert_nil tag.ast_class + assert_equal tag.attributes, {} + assert_equal tag.properties, {} + assert_equal tag.allowed_children, %i[_inline ol ul] + end + + # Exercising more of the logic used to conform args into valid + def test_html_tag_full_init + args = ['a', 'inline', ast_class: Sablon::HTMLConverter::Run] + tag = Sablon::Configuration::HTMLTag.new(*args) + assert_equal tag.name, :a + assert_equal tag.type, :inline + assert_equal tag.ast_class, Sablon::HTMLConverter::Run + # + options = { + ast_class: :run, + attributes: { dummy: 'value1' }, + properties: { dummy2: 'value2' }, + allowed_children: 'text' + } + tag = Sablon::Configuration::HTMLTag.new('a', 'inline', **options) + # + assert_equal tag.name, :a + assert_equal tag.type, :inline + assert_equal tag.ast_class, Sablon::HTMLConverter::Run + assert_equal tag.attributes, dummy: 'value1' + assert_equal tag.properties, dummy2: 'value2' + assert_equal tag.allowed_children, [:text] + end + + def test_html_tag_init_block_without_class + e = assert_raises ArgumentError do + Sablon::Configuration::HTMLTag.new(:form, :block) + end + assert_equal "Block level tag form must have an AST class.", e.message + end + + def test_html_tag_allowed_children + # define different tags for testing + text = Sablon::Configuration::HTMLTag.new(:text, :inline) + div = Sablon::Configuration::HTMLTag.new(:div, :block, ast_class: :paragraph) + olist = Sablon::Configuration::HTMLTag.new(:ol, :block, ast_class: :paragraph, allowed_children: %i[_block]) + + # test default allowances + assert div.allowed_child?(text) # all inline elements allowed + assert div.allowed_child?(olist) # tag name is included even though it is bock leve + assert_equal div.allowed_child?(div), false # other block elms are not allowed + + # test olist with allowances for all blocks but no inline + assert olist.allowed_child?(div) # all block elements allowed + assert_equal olist.allowed_child?(text), false # no inline elements + end +end diff --git a/test/fixtures/html/html_test_content.html b/test/fixtures/html/html_test_content.html new file mode 100644 index 00000000..6c580071 --- /dev/null +++ b/test/fixtures/html/html_test_content.html @@ -0,0 +1,164 @@ +

Sablon HTML insertion

+ +

Text

+ +
+ Lorem ipsum dolor sit +  ametconsectetur adipiscing elit. +  Suspendisse a tempus turpis. Duis urna justo, + vehicula vitae ultricies vel, congue at sem. Fusce turpis + turpis, aliquet id pulvinar aliquam, iaculis non elit. Nulla feugiat + lectus nulla, in dictum ipsum cursus ac. Quisque at odio neque. + Sed ac tortor iaculis, bibendum leo ut, malesuada velit. Donec iaculis + sed urna eget pharetra. Praesent ornare fermentum turpis, placerat + iaculis urna bibendum vitae. Nunc in quam consequat, tristique tellus in, + commodo turpis. Curabitur ullamcorper odio purus, lobortis egestas magna + laoreet vitae. Nunc fringilla velit ante, eu aliquam nisi cursus vitae. + Suspendisse sit amet dui egestas, volutpat + nisi vel, mattis justo. Nullam pellentesque, ipsum eget blandit pharetra, + augue elit aliquam mauris, vel mollis nisl augue ut + ipsum. +
+ +

Lists

+ +
    +
  1. + Vestibulum  +
      +
    1. ante ipsum primis 
    2. +
    +
  2. +
  3. + in faucibus orci luctus  +
      +
    1. et ultrices posuere cubilia Curae;  +
        +
      1. Aliquam vel dolor 
      2. +
      3. sed sem maximus 
      4. +
      +
    2. +
    3. + fermentum in non odio.  +
        +
      1. Fusce hendrerit ornare mollis. 
      2. +
      +
    4. +
    5. Nunc scelerisque nibh nec turpis tempor pulvinar. 
    6. +
    +
  4. +
  5. Donec eros turpis, 
  6. +
  7. + aliquet vel volutpat sit amet,  +
      +
    1. semper eu purus. 
    2. +
    3. + Proin ac erat nec urna efficitur vulputate.  +
        +
      1. Quisque varius convallis ultricies. 
      2. +
      3. Nullam vel fermentum eros. 
      4. +
      +
    4. +
    +
  8. +
+ +
+ Pellentesque nulla leo, auctor ornare erat sed, rhoncus congue diam. + Duis non porttitor nulla, ut eleifend enim. Pellentesque non tempor sem. +
+ +
Mauris auctor egestas arcu, 
+ +
    +
  1. id venenatis nibh dignissim id. 
  2. +
  3. In non placerat metus. 
  4. +
+ +
    +
  • Nunc sed consequat metus. 
  • +
  • Nulla consectetur lorem consequat, 
  • +
  • malesuada dui at, lacinia lectus. 
  • +
+ +
    +
  1. Aliquam efficitur 
  2. +
  3. lorem a mauris feugiat, 
  4. +
  5. at semper eros pellentesque. 
  6. +
+ +
+ Nunc lacus diam, consectetur ut odio sit amet, placerat pharetra erat. + Sed commodo ut sem id congue. Sed eget neque elit. Curabitur at erat tortor. + Maecenas eget sapien vitae est sagittis accumsan et nec orci. Integer + luctus at nisl eget venenatis. Nunc nunc eros, consectetur at tortor et, + tristique ultrices elit. Nulla in turpis nibh. +
+ +
    +
  • + Nam consectetur  +
      +
    • venenatis tempor. 
    • +
    +
  • +
  • + Aenean  +
      +
    • blandit +
        +
      • porttitor massa,  +
          +
        • non efficitur  +
            +
          • metus. 
          • +
          +
        • +
        +
      • +
      +
    • +
    +
  • +
  • Duis faucibus nunc nec venenatis faucibus. 
  • +
  • Aliquam erat volutpat. 
  • +
+
+ Quisque non neque ut lacus eleifend volutpat quis sed lacus. +
Praesent ultrices purus eu quam elementum, sit amet faucibus elit + interdum. In lectus orci,
elementum quis dictum ac, porta ac ante. + Fusce tempus ac mauris id cursus. Phasellus a erat nulla. Mauris dolor orci, + malesuada auctor dignissim non, posuere nec odio. Etiam hendrerit + justo nec diam ullamcorper, nec blandit elit sodales.
+
+ + +
+ Ut eget auctor enim. + Quisque id + neque eu nibh feugiat imperdiet + id ut dui. Ut auctor libero eget + massa tristique pharetra. Cras tincidunt finibus sapien, ut maximus + tortor tempor at. Proin pulvinar + pretium justo vitae malesuada. Suspendisse porta purus eget tortor + tincidunt vestibulum. Maecenas id egestas purus, quis vulputate + lacus. Quisque non + eleifend est. +
+ +
    +
  • Item 1
  • +
  • Item 2
  • +
      +
    • Nested 1
    • +
    • + Nested 2 +
        +
      • Nested 2.1
      • +
      • Nested 2.2
      • +
      • Nested 2.3
      • +
      +
    • +
    +
  • Item 3
  • +
diff --git a/test/fixtures/html_sample.docx b/test/fixtures/html_sample.docx index 2a7b8879..abe2b44c 100644 Binary files a/test/fixtures/html_sample.docx and b/test/fixtures/html_sample.docx differ diff --git a/test/fixtures/xml/comment_block_and_comment_as_key.xml b/test/fixtures/xml/comment_block_and_comment_as_key.xml new file mode 100644 index 00000000..0eae8a5d --- /dev/null +++ b/test/fixtures/xml/comment_block_and_comment_as_key.xml @@ -0,0 +1,31 @@ +Before + + + + + «comment» + + + + + + Inside Comment! + + + + + + + «endComment» + + + + + + + + «=comment» + + + +After \ No newline at end of file diff --git a/test/html/ast_builder_test.rb b/test/html/ast_builder_test.rb new file mode 100644 index 00000000..021622db --- /dev/null +++ b/test/html/ast_builder_test.rb @@ -0,0 +1,65 @@ +# -*- coding: utf-8 -*- +require "test_helper" + +# Tests some low level private methods in the ASTBuilder class. #process_nodes +# and self.html_to_ast are covered extensively in converter_test.rb +class HTMLConverterASTBuilderTest < Sablon::TestCase + def setup + super + @env = Sablon::Environment.new(nil) + end + + def test_fetch_tag + @bulider = new_builder + tag = Sablon::Configuration.instance.permitted_html_tags[:span] + assert_equal @bulider.send(:fetch_tag, :span), tag + # check that strings are converted into symbols + assert_equal @bulider.send(:fetch_tag, 'span'), tag + # test uknown tag raises error + e = assert_raises ArgumentError do + @bulider.send(:fetch_tag, :unknown_tag) + end + assert_equal "Don't know how to handle HTML tag: unknown_tag", e.message + end + + def test_validate_structure + @bulider = new_builder + root = Sablon::Configuration.instance.permitted_html_tags['#document-fragment'.to_sym] + div = Sablon::Configuration.instance.permitted_html_tags[:div] + span = Sablon::Configuration.instance.permitted_html_tags[:span] + # test valid relationship + assert_nil @bulider.send(:validate_structure, div, span) + # test inverted relationship + e = assert_raises ArgumentError do + @bulider.send(:validate_structure, span, div) + end + assert_equal "Invalid HTML structure: div is not a valid child element of span.", e.message + # test inline tag with no parent + e = assert_raises ArgumentError do + @bulider.send(:validate_structure, root, span) + end + assert_equal "Invalid HTML structure: span needs to be wrapped in a block level tag.", e.message + end + + def test_merge_properties + @builder = new_builder + node = Nokogiri::HTML.fragment('Test').children[0] + tag = Struct.new(:properties).new(rStyle: 'Normal') + # test that properties are merged across all three arguments + props = @builder.send(:merge_node_properties, node, tag, 'background-color' => '#00F') + assert_equal({ 'background-color' => '#00F', rStyle: 'Normal', 'color' => '#F00', 'text-decoration' => 'underline wavy' }, props) + # test that parent properties are overriden by tag properties + props = @builder.send(:merge_node_properties, node, tag, rStyle: 'Citation', 'background-color' => '#00F') + assert_equal({ 'background-color' => '#00F', rStyle: 'Normal', 'color' => '#F00', 'text-decoration' => 'underline wavy' }, props) + # test that inline properties override parent styles + node = Nokogiri::HTML.fragment('Test').children[0] + props = @builder.send(:merge_node_properties, node, tag, 'color' => '#00F') + assert_equal({ rStyle: 'Normal', 'color' => '#F00' }, props) + end + + private + + def new_builder(nodes = [], properties = {}) + Sablon::HTMLConverter::ASTBuilder.new(@env, nodes, properties) + end +end diff --git a/test/html/ast_test.rb b/test/html/ast_test.rb new file mode 100644 index 00000000..c232ec67 --- /dev/null +++ b/test/html/ast_test.rb @@ -0,0 +1,117 @@ +# -*- coding: utf-8 -*- +require "test_helper" + +class HTMLConverterASTTest < Sablon::TestCase + def setup + super + @converter = Sablon::HTMLConverter.new + @converter.instance_variable_set(:@env, Sablon::Environment.new(nil)) + end + + def test_div + input = '
Lorem ipsum dolor sit amet
' + ast = @converter.processed_ast(input) + assert_equal ']>]>', ast.inspect + end + + def test_p + input = '

Lorem ipsum dolor sit amet

' + ast = @converter.processed_ast(input) + assert_equal ']>]>', ast.inspect + end + + def test_b + input = '

Lorem ipsum dolor sit amet

' + ast = @converter.processed_ast(input) + assert_equal ', ]>]>', ast.inspect + end + + def test_i + input = '

Lorem ipsum dolor sit amet

' + ast = @converter.processed_ast(input) + assert_equal ', ]>]>', ast.inspect + end + + def test_br_in_strong + input = '
Lorem
ipsum
dolor
' + par = @converter.processed_ast(input).grep(Sablon::HTMLConverter::Paragraph).first + assert_equal "[, , , , ]", par.runs.inspect + end + + def test_br_in_em + input = '
Lorem
ipsum
dolor
' + par = @converter.processed_ast(input).grep(Sablon::HTMLConverter::Paragraph).first + assert_equal "[, , , , ]", par.runs.inspect + end + + def test_nested_strong_and_em + input = '
Lorem ipsum dolor
' + par = @converter.processed_ast(input).grep(Sablon::HTMLConverter::Paragraph).first + assert_equal "[, , ]", par.runs.inspect + end + + def test_ignore_last_br_in_div + input = '
Lorem ipsum dolor sit amet
' + par = @converter.processed_ast(input).grep(Sablon::HTMLConverter::Paragraph).first + assert_equal "[]", par.runs.inspect + end + + def test_ignore_br_in_blank_div + input = '

' + par = @converter.processed_ast(input).grep(Sablon::HTMLConverter::Paragraph).first + assert_equal "[]", par.runs.inspect + end + + def test_headings + input = '

First

Second

Third

' + ast = @converter.processed_ast(input) + assert_equal "]>, ]>, ]>]>", ast.inspect + end + + def test_h_with_formatting + input = '

Lorem ipsum dolor sit amet

' + ast = @converter.processed_ast(input) + assert_equal ", , , ]>]>", ast.inspect + end + + def test_ul + input = '
  • Lorem
  • ipsum
' + ast = @converter.processed_ast(input) + assert_equal "]>, ]>]>]>", ast.inspect + end + + def test_ol + input = '
  1. Lorem
  2. ipsum
' + ast = @converter.processed_ast(input) + assert_equal "]>, ]>]>]>", ast.inspect + end + + def test_num_id + ast = @converter.processed_ast('
  1. Some
  2. Lorem
  • ipsum
  1. dolor
  2. sit
') + assert_equal %w[1001 1001 1002 1003 1003], get_numpr_prop_from_ast(ast, :numId) + end + + def test_nested_lists_have_the_same_numid + ast = @converter.processed_ast('
  • Lorem
    • ipsum
      • dolor
') + assert_equal %w[1001 1001 1001], get_numpr_prop_from_ast(ast, :numId) + end + + def test_keep_nested_list_order + input = '
  • 1
    • 1.1
      • 1.1.1
    • 1.2
  • 2
    • 1.3
      • 1.3.1
' + ast = @converter.processed_ast(input) + assert_equal %w[1001], get_numpr_prop_from_ast(ast, :numId).uniq + assert_equal %w[0 1 2 1 0 1 2], get_numpr_prop_from_ast(ast, :ilvl) + end + + private + + # returns the numid attribute from paragraphs + def get_numpr_prop_from_ast(ast, key) + values = [] + ast.grep(Sablon::HTMLConverter::ListParagraph).each do |para| + numpr = para.instance_variable_get('@properties')[:numPr] + numpr.each { |val| values.push(val[key]) if val[key] } + end + values + end +end diff --git a/test/html/converter_test.rb b/test/html/converter_test.rb index d2d619c5..b0193192 100644 --- a/test/html/converter_test.rb +++ b/test/html/converter_test.rb @@ -92,7 +92,7 @@ def test_convert_u_tags_inside_p Lorem - + ipsum dolor sit amet @@ -114,6 +114,54 @@ def test_convert_em_tags_inside_div assert_equal normalize_wordml(expected_output), process(input) end + def test_convert_s_tags_inside_p + input = '

Lorem ipsum dolor sit amet

' + expected_output = <<-DOCX.strip + + + Lorem + + + ipsum dolor + + sit amet + + DOCX + assert_equal normalize_wordml(expected_output), process(input) + end + + def test_convert_sub_tags_inside_p + input = '

Lorem ipsum dolor sit amet

' + expected_output = <<-DOCX.strip + + + Lorem + + + ipsum dolor + + sit amet + + DOCX + assert_equal normalize_wordml(expected_output), process(input) + end + + def test_convert_sup_tags_inside_p + input = '

Lorem ipsum dolor sit amet

' + expected_output = <<-DOCX.strip + + + Lorem + + + ipsum dolor + + sit amet + + DOCX + assert_equal normalize_wordml(expected_output), process(input) + end + def test_convert_br_tags_inside_strong input = '

Lorem ipsum
dolor sit amet
' expected_output = <<-DOCX @@ -310,6 +358,13 @@ def test_nested_unordered_lists assert_equal [Sablon::Numbering::Definition.new(1001, 'ListBullet')], @numbering.definitions end + def test_unknown_tag + e = assert_raises ArgumentError do + process('') + end + assert_match(/Don't know how to handle HTML tag:/, e.message) + end + private def process(input) @@ -321,106 +376,329 @@ def normalize_wordml(wordml) end end -class HTMLConverterASTTest < Sablon::TestCase +class HTMLConverterStyleTest < Sablon::TestCase def setup super + @env = Sablon::Environment.new(nil) @converter = Sablon::HTMLConverter.new - @converter.instance_variable_set(:@numbering, Sablon::Environment.new(nil).numbering) end - def test_div - input = '
Lorem ipsum dolor sit amet
' - ast = @converter.processed_ast(input) - assert_equal ']>]>', ast.inspect + # testing direct CSS style -> WordML conversion for paragraphs + + def test_paragraph_with_background_color + input = '

' + expected_output = para_with_ppr('') + assert_equal normalize_wordml(expected_output), process(input) end - def test_p - input = '

Lorem ipsum dolor sit amet

' - ast = @converter.processed_ast(input) - assert_equal ']>]>', ast.inspect + def test_paragraph_with_borders + # Basic single line black border + input = '

' + ppr = <<-DOCX.strip + + + + + + + DOCX + expected_output = para_with_ppr(ppr) + assert_equal normalize_wordml(expected_output), process(input) + # border with a line style + input = '

' + ppr = <<-DOCX.strip + + + + + + + DOCX + expected_output = para_with_ppr(ppr) + assert_equal normalize_wordml(expected_output), process(input) + # border with line style and color + input = '

' + ppr = <<-DOCX.strip + + + + + + + DOCX + expected_output = para_with_ppr(ppr) + assert_equal normalize_wordml(expected_output), process(input) end - def test_b - input = '

Lorem ipsum dolor sit amet

' - ast = @converter.processed_ast(input) - assert_equal ', ]>]>', ast.inspect + def test_paragraph_with_text_align + input = '

' + expected_output = para_with_ppr('') + assert_equal normalize_wordml(expected_output), process(input) end - def test_i - input = '

Lorem ipsum dolor sit amet

' - ast = @converter.processed_ast(input) - assert_equal ', ]>]>', ast.inspect + def test_paragraph_with_vertical_align + input = '

' + expected_output = para_with_ppr('') + assert_equal normalize_wordml(expected_output), process(input) end - def test_br_in_strong - input = '
Lorem
ipsum
dolor
' - par = @converter.processed_ast(input).grep(Sablon::HTMLConverter::Paragraph).first - assert_equal "[, , , , ]", par.runs.inspect + def test_paragraph_with_unsupported_property + input = '

' + expected_output = para_with_ppr('') + assert_equal normalize_wordml(expected_output), process(input) end - def test_br_in_em - input = '
Lorem
ipsum
dolor
' - par = @converter.processed_ast(input).grep(Sablon::HTMLConverter::Paragraph).first - assert_equal "[, , , , ]", par.runs.inspect + def test_run_with_background_color + input = '

test

' + expected_output = run_with_rpr('') + assert_equal normalize_wordml(expected_output), process(input) end - def test_nested_strong_and_em - input = '
Lorem ipsum dolor
' - par = @converter.processed_ast(input).grep(Sablon::HTMLConverter::Paragraph).first - assert_equal "[, , ]", par.runs.inspect + def test_run_with_color + input = '

test

' + expected_output = run_with_rpr('') + assert_equal normalize_wordml(expected_output), process(input) end - def test_ignore_last_br_in_div - input = '
Lorem ipsum dolor sit amet
' - par = @converter.processed_ast(input).grep(Sablon::HTMLConverter::Paragraph).first - assert_equal "[]", par.runs.inspect + def test_run_with_font_size + input = '

test

' + expected_output = run_with_rpr('') + assert_equal normalize_wordml(expected_output), process(input) + + # test that non-numeric are ignored + input = '

test

' + assert_equal normalize_wordml(expected_output), process(input) + + # test that floats round up + input = '

test

' + assert_equal normalize_wordml(expected_output), process(input) + end + + def test_run_with_font_style + input = '

test

' + expected_output = run_with_rpr('') + assert_equal normalize_wordml(expected_output), process(input) + + # test that non-numeric are ignored + input = '

test

' + expected_output = run_with_rpr('') + assert_equal normalize_wordml(expected_output), process(input) + end + + def test_run_with_font_wieght + input = '

test

' + expected_output = run_with_rpr('') + assert_equal normalize_wordml(expected_output), process(input) + end + + def test_run_with_text_decoration + # testing underline configurations + input = '

test

' + expected_output = run_with_rpr('') + assert_equal normalize_wordml(expected_output), process(input) + + input = '

test

' + expected_output = run_with_rpr('') + assert_equal normalize_wordml(expected_output), process(input) + + input = '

test

' + expected_output = run_with_rpr('') + assert_equal normalize_wordml(expected_output), process(input) + + # testing line-through + input = '

test

' + expected_output = run_with_rpr('') + assert_equal normalize_wordml(expected_output), process(input) + + # testing that unsupported values are passed through as a toggle + input = '

test

' + expected_output = run_with_rpr('') + assert_equal normalize_wordml(expected_output), process(input) + + input = '

test

' + expected_output = run_with_rpr('') + assert_equal normalize_wordml(expected_output), process(input) + end + + def test_run_with_vertical_align + input = '

test

' + expected_output = run_with_rpr('') + assert_equal normalize_wordml(expected_output), process(input) + + input = '

test

' + expected_output = run_with_rpr('') + assert_equal normalize_wordml(expected_output), process(input) end - def test_ignore_br_in_blank_div - input = '

' - par = @converter.processed_ast(input).grep(Sablon::HTMLConverter::Paragraph).first - assert_equal "[]", par.runs.inspect + def test_run_with_unsupported_property + input = '

test

' + expected_output = 'test' + assert_equal normalize_wordml(expected_output), process(input) + end + + # tests with nested runs and styles + + def test_paragraph_props_passed_to_runs + input = '

Loremipsum

' + expected_output = <<-DOCX.strip + + + + + + + + + + Lorem + + + + + + ipsum + + + DOCX + assert_equal normalize_wordml(expected_output), process(input) + end + + def test_run_prop_override_paragraph_prop + input = '

Loremipsum

' + expected_output = <<-DOCX.strip + + + + + + + + + + Lorem + + + + + + ipsum + + + DOCX + assert_equal normalize_wordml(expected_output), process(input) end - def test_headings - input = '

First

Second

Third

' - ast = @converter.processed_ast(input) - assert_equal "]>, ]>, ]>]>", ast.inspect + def test_inline_style_overrides_tag_style + # Note: a toggle property can not be removed once it becomes a symbol + # unless there is a specific CSS style that will set it to false. This + # is because CSS styles can only override parent properties not remove them. + input = '

test

' + expected_output = run_with_rpr('') + assert_equal normalize_wordml(expected_output), process(input) end - def test_h_with_formatting - input = '

Lorem ipsum dolor sit amet

' - ast = @converter.processed_ast(input) - assert_equal ", , , ]>]>", ast.inspect + def test_conversion_of_a_registered_tag_without_ast_class + # This registers a new tag with the configuration object and then trys + # to convert it + Sablon.configure do |config| + config.register_html_tag(:bgcyan, :inline, properties: { 'highlight' => { val: 'cyan' } }) + end + # + input = '

test

' + expected_output = run_with_rpr('') + assert_equal normalize_wordml(expected_output), process(input) + + # remove the tag to avoid any accidental side effects + Sablon.configure do |config| + config.remove_html_tag(:bgcyan) + end + end + + def test_conversion_of_a_registered_tag_with_ast_class + Sablon.configure do |config| + # create the AST class and then pass it onto the register tag method + ast_class = Class.new(Sablon::HTMLConverter::Node) do + def self.name + 'TestInstr' + end + + def initialize(_env, node, _properties) + @content = node.text + end + + def inspect + @content + end + + def to_docx + " #{@content} " + end + end + # + config.register_html_tag(:test_instr, :inline, ast_class: ast_class) + end + # + input = '

test

' + expected_output = <<-DOCX.strip + + + + + test + + DOCX + assert_equal normalize_wordml(expected_output), process(input) + + # remove the tag to avoid any accidental side effects + Sablon.configure do |config| + config.remove_html_tag(:test_instr) + end end - def test_ul - input = '
  • Lorem
  • ipsum
' - ast = @converter.processed_ast(input) - assert_equal "]>, ]>]>", ast.inspect + def test_conversion_of_registered_style_attribute + Sablon.configure do |config| + converter = ->(v) { return :highlight, v } + config.register_style_converter(:run, 'test-highlight', converter) + end + # + input = '

test

' + expected_output = run_with_rpr('') + assert_equal normalize_wordml(expected_output), process(input) + # + Sablon.configure do |config| + config.remove_style_converter(:run, 'test-highlight') + end end - def test_ol - input = '
  1. Lorem
  2. ipsum
' - ast = @converter.processed_ast(input) - assert_equal "]>, ]>]>", ast.inspect + private + + def process(input) + @converter.process(input, @env) end - def test_num_id - ast = @converter.processed_ast('
  1. Some
  2. Lorem
  • ipsum
  1. dolor
  2. sit
') - assert_equal [1001, 1001, 1002, 1003, 1003], ast.grep(Sablon::HTMLConverter::ListParagraph).map(&:numid) + def para_with_ppr(ppr_str) + para_str = '%s' + format(para_str, ppr_str) end - def test_nested_lists_have_the_same_numid - ast = @converter.processed_ast('
  • Lorem
    • ipsum
      • dolor
') - assert_equal [1001, 1001, 1001], ast.grep(Sablon::HTMLConverter::ListParagraph).map(&:numid) + def run_with_rpr(rpr_str) + para_str = <<-DOCX.strip + + + + + + + %s + + test + + + DOCX + format(para_str, rpr_str) end - def test_keep_nested_list_order - input = '
  • 1
    • 1.1
      • 1.1.1
    • 1.2
  • 2
    • 1.3
      • 1.3.1
' - ast = @converter.processed_ast(input) - list_p = ast.grep(Sablon::HTMLConverter::ListParagraph) - assert_equal [1001], list_p.map(&:numid).uniq - assert_equal [0, 1, 2, 1, 0, 1, 2], list_p.map(&:ilvl) + def normalize_wordml(wordml) + wordml.gsub(/^\s+/, '').tr("\n", '') end end diff --git a/test/html/node_properties_test.rb b/test/html/node_properties_test.rb new file mode 100644 index 00000000..2f9b48cc --- /dev/null +++ b/test/html/node_properties_test.rb @@ -0,0 +1,113 @@ +# -*- coding: utf-8 -*- +require "test_helper" + +class NodePropertiesTest < Sablon::TestCase + def setup + # struct to simplify prop whitelisting during tests + @inc_props = Struct.new(:props) do + def include?(*) + true + end + end + end + + def test_empty_node_properties_converison + # test empty properties + props = Sablon::HTMLConverter::NodeProperties.new('w:pPr', {}, @inc_props.new) + assert_equal props.inspect, '' + assert_nil props.to_docx + end + + def test_simple_node_property_converison + props = { 'pStyle' => 'Paragraph' } + props = Sablon::HTMLConverter::NodeProperties.new('w:pPr', props, @inc_props.new) + assert_equal props.inspect, 'pStyle=Paragraph' + assert_equal props.to_docx, '' + end + + def test_node_property_with_nil_value_converison + props = { 'b' => nil } + props = Sablon::HTMLConverter::NodeProperties.new('w:rPr', props, @inc_props.new) + assert_equal props.inspect, 'b' + assert_equal props.to_docx, '' + end + + def test_node_property_with_hash_value_converison + props = { 'shd' => { color: 'clear', fill: '123456', test: nil } } + props = Sablon::HTMLConverter::NodeProperties.new('w:rPr', props, @inc_props.new) + assert_equal props.inspect, 'shd={:color=>"clear", :fill=>"123456", :test=>nil}' + assert_equal props.to_docx, '' + end + + def test_node_property_with_array_value_converison + props = { 'numPr' => [{ 'ilvl' => 1 }, { 'numId' => 34 }] } + props = Sablon::HTMLConverter::NodeProperties.new('w:pPr', props, @inc_props.new) + assert_equal props.inspect, 'numPr=[{"ilvl"=>1}, {"numId"=>34}]' + assert_equal props.to_docx, '' + end + + def test_complex_node_properties_conversion + props = { + 'top1' => 'val1', + 'top2' => [ + { 'mid0' => nil }, + { 'mid1' => [ + { 'bottom1' => { key1: 'abc' } }, + { 'bottom2' => 'xyz' } + ] }, + { 'mid2' => 'val2' } + ], + 'top3' => { key1: 1, key2: '2', key3: nil, key4: true, key5: false } + } + output = <<-DOCX.gsub(/^\s*/, '').delete("\n") + + + + + + + + + + + + + DOCX + props = Sablon::HTMLConverter::NodeProperties.new('w:pPr', props, @inc_props.new) + assert_equal props.to_docx, output + end + + def test_setting_property_value + props = {} + props = Sablon::HTMLConverter::NodeProperties.new('w:pPr', props, @inc_props.new) + props['rStyle'] = 'FootnoteText' + assert_equal({ 'rStyle' => 'FootnoteText' }, props.instance_variable_get(:@properties)) + end + + def test_properties_filtered_on_init + props = { 'pStyle' => 'Paragraph', 'rStyle' => 'EndnoteText' } + props = Sablon::HTMLConverter::NodeProperties.new('w:rPr', props, %w[rStyle]) + assert_equal({ 'rStyle' => 'EndnoteText' }, props.instance_variable_get(:@properties)) + end + + def test_transferred_properties + props = { 'pStyle' => 'Paragraph', 'rStyle' => 'EndnoteText' } + props = Sablon::HTMLConverter::NodeProperties.new(nil, props, %w[pStyle]) + trans = props.transferred_properties + assert_equal({ 'rStyle' => 'EndnoteText' }, trans) + end + + def test_node_properties_paragraph_factory + props = { 'pStyle' => 'Paragraph' } + props = Sablon::HTMLConverter::NodeProperties.paragraph(props) + assert_equal 'pStyle=Paragraph', props.inspect + assert_equal props.to_docx, '' + end + + def test_node_properties_run_factory + props = { 'color' => 'FF00FF' } + props = Sablon::HTMLConverter::NodeProperties.run(props) + assert_equal 'color=FF00FF', props.inspect + assert_equal '', props.to_docx + end +end diff --git a/test/html_test.rb b/test/html_test.rb index 258890e7..afd89ee4 100644 --- a/test/html_test.rb +++ b/test/html_test.rb @@ -1,9 +1,10 @@ # -*- coding: utf-8 -*- require "test_helper" -require "support/xml_snippets" +require "support/html_snippets" class SablonHTMLTest < Sablon::TestCase include Sablon::Test::Assertions + include HTMLSnippets def setup super @@ -16,7 +17,7 @@ def test_generate_document_from_template_with_styles_and_html template_path = @base_path + "fixtures/insertion_template.docx" output_path = @base_path + "sandbox/html.docx" template = Sablon.template template_path - context = {'html:content' => content} + context = { 'html:content' => content } template.render_to_file output_path, context assert_docx_equal @sample_path, output_path @@ -26,7 +27,7 @@ def test_generate_document_from_template_without_styles_and_html template_path = @base_path + "fixtures/insertion_template_no_styles.docx" output_path = @base_path + "sandbox/html_no_styles.docx" template = Sablon.template template_path - context = {'html:content' => content} + context = { 'html:content' => content } e = assert_raises(ArgumentError) do template.render_to_file output_path, context @@ -37,13 +38,12 @@ def test_generate_document_from_template_without_styles_and_html end private + def content - <<-HTML -

Sablon HTML insertion

-

Text

-
Lorem ipsum dolor sit ametconsectetur adipiscing elitSuspendisse a tempus turpis. Duis urna justo, vehicula vitae ultricies vel, congue at sem. Fusce turpis turpis, aliquet id pulvinar aliquam, iaculis non elit. Nulla feugiat lectus nulla, in dictum ipsum cursus ac. Quisque at odio neque. Sed ac tortor iaculis, bibendum leo ut, malesuada velit. Donec iaculis sed urna eget pharetra. Praesent ornare fermentum turpis, placerat iaculis urna bibendum vitae. Nunc in quam consequat, tristique tellus in, commodo turpis. Curabitur ullamcorper odio purus, lobortis egestas magna laoreet vitae. Nunc fringilla velit ante, eu aliquam nisi cursus vitae. Suspendisse sit amet dui egestas, volutpat nisi vel, mattis justo. Nullam pellentesque, ipsum eget blandit pharetra, augue elit aliquam mauris, vel mollis nisl augue ut ipsum.
-

Lists

-
  1. Vestibulum 
    1. ante ipsum primis 
  2. in faucibus orci luctus 
    1. et ultrices posuere cubilia Curae; 
      1. Aliquam vel dolor 
      2. sed sem maximus 
    2. fermentum in non odio. 
      1. Fusce hendrerit ornare mollis. 
    3. Nunc scelerisque nibh nec turpis tempor pulvinar. 
  3. Donec eros turpis, 
  4. aliquet vel volutpat sit amet, 
    1. semper eu purus. 
    2. Proin ac erat nec urna efficitur vulputate. 
      1. Quisque varius convallis ultricies. 
      2. Nullam vel fermentum eros. 
Pellentesque nulla leo, auctor ornare erat sed, rhoncus congue diam. Duis non porttitor nulla, ut eleifend enim. Pellentesque non tempor sem.
Mauris auctor egestas arcu, 
  1. id venenatis nibh dignissim id. 
  2. In non placerat metus. 
  • Nunc sed consequat metus. 
  • Nulla consectetur lorem consequat, 
  • malesuada dui at, lacinia lectus. 
  1. Aliquam efficitur 
  2. lorem a mauris feugiat, 
  3. at semper eros pellentesque. 
Nunc lacus diam, consectetur ut odio sit amet, placerat pharetra erat. Sed commodo ut sem id congue. Sed eget neque elit. Curabitur at erat tortor. Maecenas eget sapien vitae est sagittis accumsan et nec orci. Integer luctus at nisl eget venenatis. Nunc nunc eros, consectetur at tortor et, tristique ultrices elit. Nulla in turpis nibh.
  • Nam consectetur 
    • venenatis tempor. 
  • Aenean 
    • blandit
      • porttitor massa, 
        • non efficitur 
          • metus. 
  • Duis faucibus nunc nec venenatis faucibus. 
  • Aliquam erat volutpat. 
Quisque non neque ut lacus eleifend volutpat quis sed lacus.
Praesent ultrices purus eu quam elementum, sit amet faucibus elit interdum. In lectus orci,
elementum quis dictum ac, porta ac ante. Fusce tempus ac mauris id cursus. Phasellus a erat nulla. Mauris dolor orci, malesuada auctor dignissim non, posuere nec odio. Etiam hendrerit justo nec diam ullamcorper, nec blandit elit sodales.
-HTML + html_str = snippet('html_test_content') + # combine all white space + html_str = html_str.gsub(/\s+/, ' ') + # clear any white space between block level tags and other content + html_str.gsub(%r{\s*<(/?(?:h\d|div|p|br|ul|ol|li).*?)>\s*}, '<\1>') end end diff --git a/test/processor/document_test.rb b/test/processor/document_test.rb index 9222a47a..a649c239 100644 --- a/test/processor/document_test.rb +++ b/test/processor/document_test.rb @@ -502,6 +502,21 @@ def test_image_replacement document end + def test_comment_block_and_comment_as_key + result = process(snippet("comment_block_and_comment_as_key"), {comment: 'Contents of comment key'}) + + assert_xml_equal <<-document, result + Before + After + + + + Contents of comment key + + + document + end + private def process(document, context) diff --git a/test/support/html_snippets.rb b/test/support/html_snippets.rb new file mode 100644 index 00000000..a36615d2 --- /dev/null +++ b/test/support/html_snippets.rb @@ -0,0 +1,9 @@ +module HTMLSnippets + def snippet(name) + File.read(File.expand_path("#{name}.html", snippet_path)) + end + + def snippet_path + @snippet_path ||= File.expand_path("../../fixtures/html", __FILE__) + end +end