README.adoc in html2doc-0.6.9 vs README.adoc in html2doc-0.7.0
- old
+ new
@@ -24,22 +24,18 @@
* Inject Microsoft Word-specific CSS into the HTML document. If a CSS file is not supplied, the CSS file used is at `lib/html2doc/wordstyle.css` is used by default. Microsoft Word HTML has particular requirements from its CSS, and you should review the sample CSS before replacing it with your own. (This generic CSS can be overridden by CSS already in the HTML document, since the generic CSS is injected at the top of the document.)
* Bundle up the images, the HTML file of the document proper, and the `header.html` file representing header/footer information, into a MIME file, and save that file to disk (so that Microsoft Word can deal with it as a Word file.)
For a representative generator of HTML that uses this gem in postprocessing, see https://github.com/riboseinc/asciidoctor-iso
-Work to be done:
-
-* Render (editorial) comments
-
== Constraints
-This generates .doc documents. Future versions will upgrade the output to docx.
+This generates .doc documents. Future versions may upgrade the output to docx.
There there are two other Microsoft Word vendors in the Ruby ecosystem.
* https://github.com/jetruby/puredocx generate Word documents from a ruby struct as a DSL, rather than converting a preexisting html document. That constrains it's coverage to what is explicitly catered for in the DSL.
-* https://github.com/MuhammetDilmac/Html2Docx is a much simpler wrapper around html: it does not do any of the added functionality described above (image resizing, converting footnotes, AsciiMath and MathML), though it does already generate docx.
+* https://github.com/MuhammetDilmac/Html2Docx is a much simpler wrapper around html: it does not do any of the added functionality described above (image resizing, converting footnotes, AsciiMath and MathML). However it does already generate docx, which involves many more auxiliary files than the .doc format. (Any attempt to generate docx through this gem will likely involve Html2Docx.)
== Usage
[source,ruby]
--
@@ -56,10 +52,17 @@
asciimathdelims:: are the AsciiMath delimiters used in the text (an array of an opening and a closing delimiter). If none are provided, no AsciiMath conversion is attempted.
liststyles:: a hash of list style labels in Word CSS, which are used to define the behaviour of list item labels (e.g. _i)_ vs _i._). The gem recognises the hash keys `ul`, `ol`. So if the appearance of an ordered list's item labels in the supplied stylesheet is governed by style `@list l1` (e.g. `@list l1:level1 {mso-level-text:"%1\)";}` appears in the stylesheet), call the method with `liststyles:{ol: "l1"}`.
Note that the local CSS stylesheet file contains a variable `FILENAME` for the location of footnote/endnote separators and headers/footers, which are provided in the header HTML file. The gem replaces `FILENAME` with the file name that the document will be saved as. If you supply your own stylesheet and also wish to use separators or headers/footers, you will likewise need to replace the document name mentioned in your stylesheet with a `FILENAME` string.
+We include a script in this distribution that processes files from the command line, optionally including header and stylesheet:
+
+[source,console]
+--
+$ bin/html2doc --header header.html --stylesheet stylesheet.css filename.html
+--
+
== Caveats
=== HTML
The good news is that Word understands HTML.
@@ -77,9 +80,15 @@
This gem is published with an early draft of the XSLT stylesheet transforming MathML into OOXML, `mml2omml.xsl`, that has published for several years now as part of the https://github.com/TEIC/Stylesheets[TEI stylesheet set]. (We have made some further minor edits to the stylesheet.) The stylesheets have been published under a dual Creative Commons Sharealike/BSD licence.
The good news is that the stylesheet is not identical to the stylesheet `mathml2omml.xsl` that is published with Microsoft Word, so it can and has been redistributed.
The bad news is that the stylesheet is not identical to the stylesheet `mathml2omml.xsl` that is published with Microsoft Word, so it isn't guaranteed to have identical output. If you want to make sure that your MathML import is identical to what Word currently uses, replace `mml2omml.xsl` with `mathml2omml.xsl`, and edit the gem accordingly for your local installation. On Windows, you will find the stylesheet in the same directory as the `winword.exe` executable. On Mac, right-click on the Word application, and select "Show Package Contents"; you will find the stylesheet under `Contents/Resources`.
+
+=== Lists
+Natively, Word does not use `<ol>`, `<ul>`, or `<dl>` lists in its HTML exports at all: it uses paragraphs styled with list styles. If you save a Word document as HTML in order to use its CSS for Word documents generated by HTML, those styles will still work (with the caveat that you will need to extract the `@list` style specific to ordered and unordered lists, and pass it as a `liststyles` parameter to the conversion). *However*, Word applies a default indentation to all instances of `<ol>`, `<ul>` and `<dl>`, which the CSS stylesheet of a Word HTML will not have accounted for (because the Word HTML does not use lists at all.) If you are going to reuse that CSS for generating new documents using lists, you will need to add the rule `margin-left:0pt` to `ul`, `ol`, `dl` in the CSS stylesheet you supply, so that the margins in the Word-exported CSS remain correct.
+
+=== Math Positioning
+By default, mathematical formulas that are the only content of their paragraph are rendered as centered in Word. If you want your AsciiMath or MathML to be left-aligned or right-aligned, add `style="text-align:left"` or `style="text-align:right"` to its ancestor `div`, `p` or `td` node in HTML.
== Example
The `spec/examples` directory includes `rice.doc` and its source files: this Word document has been generated from `rice.html` through a call to html2doc from https://github.com/riboseinc/asciidoctor-iso. (The source document `rice.html` was itself generated from Asciidoc, rather than being hand-crafted.)