README in tartan-0.1.0 vs README in tartan-0.1.1

- old
+ new

@@ -12,23 +12,24 @@ 1. separates the specific wiki syntax specification from the implementation 2. allows layering and extension of parsing rules 3. allows multiple output formats from the same syntax specification -The current implementation of Tartan is in Ruby and includes a full Markdown +The current implementation of Tartan is in Ruby and includes a full Markdown[http://daringfireball.net/projects/markdown/] parser (described in YAML). The format of the parsing specification has been created with an eye to having a language independent definition of wiki (and possibly other) mark-ups. That's a lofty goal, and Tartan hasn't quite gotten there yet, but we think there's a clear path. In any case, even if it is only available in Ruby it will hopefully be helpful for projects needing to do something more than just convert wiki text directly into HTML. == Usage -So, really all you want to do is generate HTML from Markdown text. Here's +So, really all you want to do is generate HTML from Markdown[http://daringfireball.net/projects/markdown/] text. Here's how you do it: + # require 'rubygems' # if you are pulling Tartan in as a gem require 'tartan_markdown' html = TartanMarkdown.new("* howdy\n* doody").to_html # => "<ul>\n<li>howdy</li>\n<li>doody</li>\n</ul>" @@ -41,13 +42,13 @@ === Layering Parsers You can add parsing syntax to existing parsers. This is done by building up a set of parsers specifications that work together. -In the Tartan distribution you have a specification for Markdown and you also +In the Tartan distribution you have a specification for Markdown[http://daringfireball.net/projects/markdown/] and you also have a specification for table mark-up. You can combine them by creating a new -class that layers the tables onto the Markdown definition as follows in a file +class that layers the tables onto the Markdown[http://daringfireball.net/projects/markdown/] definition as follows in a file called <tt>tartan_markdown_tables.rb</tt>: require 'tartan_markdown_def' require 'tartan_table_def' @@ -56,22 +57,21 @@ include TartanTableDef end In another file you could use this new parser: - require 'tartan_markdown_tables' html = TartanMarkdownTables.new("[|*happy*||**days**|]").to_html # => "<table class=\"\"> <tr><td><em>happy</em></td><td><strong>days</strong></td></tr> </table>" == The Parsing Specification +Each specific parser (Markdown[http://daringfireball.net/projects/markdown/] to HTML, Textile to HTML, your wiki to xml, etc.) needs a parsing specification to tell Tartan how to convert the text into HTML (or what ever other format you need). - === Overall Structure Each parser is made up of a parsing definition and optional helper methods. The specification is defined in YAML and the helper methods are defined in a parser definition class. The parsing definition in YAML has the following general structure: @@ -89,42 +89,175 @@ ==== Parsing Rules The following is a simple parsing rule to match paragraphs and mark them up in HTML: title: paragraph - match: "/(^[^\n]+$\n)+^[^\n]+$/m" + match: /(^[^\n]+$\n)*^[^\n]+$/m html: start_mark: <p> end_mark: </p> A paragraph, in this case, is any grouping of non blank lines. -The parser will repetitively apply the <tt>match</tt> regular expression and if it matches, the <tt>html</tt> output sub-rule will put <tt><p></tt> and <tt></p></tt> around the text that is matched as a paragraph. +The parser will repetitively apply the <tt>match</tt> regular expression and if it matches, the <tt>html</tt> output sub-rule will put the <tt>start_mark</tt>, <tt><p></tt>, and the <tt>end_mark</tt>, <tt></p></tt>, around the text that is matched as a paragraph. If we wanted to also mark off blocks of code that are indented by say 2 or more spaces at the beginning of the line, we could use the following rule: title: code - match: "/(^[ ]{2,}\S.+?$\n)+^[ ]{2,}\S.+?$/m" + match: /(^[ ]{2,}\S.+?$\n)+^[ ]{2,}\S.+?$/m html: start_mark: <pre><code> end_mark: </code></pre> When we want to add the <tt>code</tt> rule, the ordering becomes important. If we put the <tt>paragraph</tt> rule first, it will gobble up both the paragraphs and the code blocks since it's just looking for groups of non blank lines. To prevent this we need to put the <tt>code</tt> rule first. So the overall definition would be: block: - title: code - match: "/(^[ ]{2,}\S.+?$\n)+^[ ]{2,}\S.+?$/m" + match: /(^[ ]{2,}\S.+?$\n)+^[ ]{2,}\S.+?$/m html: start_mark: <pre><code> end_mark: </code></pre> - title: paragraph match: "/(^[^\n]+$\n)+^[^\n]+$/m" html: start_mark: <p> end_mark: </p> +Now, lets say we want to be able to mark-up text with emphasis (HTML <tt><em></tt>) and strong emphasis (HTML <tt><strong></tt>) in paragraph text, but not code. We'll use an asterisk (*) around text we want to have emphasis and a double asterisk around text we want to have strong emphasis (**). Note that we don't want this to happen in text in a code block. + +To do this, we set up a new parsing context for paragraph body text and "point" the parser to the context when it recognizes a paragraph. +First, we create the paragraph parsing context: + + paragraph: + - title: strong + match: /\*\*(.*?)\*\*/ + html: + replace: <strong>\1</strong> + + - rescan + + - title: emphasis + match: /\*(.*?)\*/ + html: + replace: <em>\1</em> + +The <tt>rescan</tt> directive between the <tt>strong</tt> and <tt>emphasis</tt> rules tells the parser to "start over". This is needed because otherwise the <tt>strong</tt> rule would "claim" all the text it matched and the <tt>emphasis</tt> rule wouldn't have a chance to parse any of it. This would come into play if we had a paragraph such as: + + Now listen to this **I want *you* to really hear me**. + +This should get marked up as: + + <p>Now listen to this <strong>I want <em>you<em> to really hear me</strong>.</p> + +but we would get the following without the rescan: + + <p>Now listen to this <strong>I want *you* to really hear me</strong>.</p> + +You might also note that the ordering here, again, is important. If we leave out the <tt>rescan</tt>, we would get the following output instead: + + <p>Now listen to this <em></em>I want <em>you</em> to really hear me<em></em>.</p> + +Now, we also need to modify the paragraph rule in the <tt>block</tt> context to use the new <tt>paragraph</tt> context: + + # . . . + - title: paragraph + match: /(^[^\n]+$\n)*^[^\n]+$/m + subparse: paragraph + html: + start_mark: <p> + end_mark: </p> + # . . . + +To do this we use the <tt>subparse</tt> directive to tell the parser that the contents of the paragraph should be parsed by the <tt>paragraph</tt> context. + +==== Creating a Mix-in + +It's possible to mix-in or layer a parsing specification with a base parser. This allows you to add additional markup or change the markup of an existing syntax. You could use this to add table mark-up to Markdown[http://daringfireball.net/projects/markdown/] (in fact, this mix-in to Markdown is available as part of the Tartan code distribution). + +To show how this works, we'll look at how to specify and then add character element markup to the parser example we've been working with. We want to turn things like "<", "&" and "->" into "&lt;", "&amp;" and "&rarr;". + +We want these transformation to be done in the context of parsing paragraphs, so we'll only want to add to the <tt>paragraph</tt> context in our previous example. + +So, to add this syntax parsing, you would create the following specification: + + paragraph: + - rescan + - title: amp + match: /&/ + html: + replace: '&amp;' + rescan: true + - title: rightArrow + match: /->/ + html: + replace: '&rarr;' + rescan: true + - title: lessThan + match: /</ + html: + replace: '&lt;' + rescan: true + - title: greaterThan + match: />/ + html: + replace: '&gt;' + +That's it for the mix-in specification. Now we add these to the previous set. We didn't touch on file naming of specifications before, but now we need to. Let's say that we put the previous specification in a file called <tt>example-parser.yml</tt> and we put the new spec in <tt>entities.yml</tt>. To combine them, we would create a new Ruby class like this: + + class ExampleParserWithEntities < Tartan + yaml "example-parser.yml" + yaml "entities.yml" + end + +By default, the rules of a mix-in are added to the end of any given context. So, the effective resulting specification once the two sets of rules are combined would be: + block: + - title: code + match: /(^[ ]{2,}\S.+?$\n)+^[ ]{2,}\S.+?$/m + html: + start_mark: <pre><code> + end_mark: </code></pre> + - title: paragraph + match: /(^[^\n]+$\n)*^[^\n]+$/m + subparse: paragraph + html: + start_mark: <p> + end_mark: </p> + paragraph: + - title: emphasis + match: /\*(.*?)\*/ + html: + replace: <em>\1</em> + - rescan + - title: amp + match: /&/ + html: + replace: '&amp;' + rescan: true + - title: rightArrow + match: /->/ + html: + replace: '&rarr;' + rescan: true + - title: lessThan + match: /</ + html: + replace: '&lt;' + rescan: true + - title: greaterThan + match: />/ + html: + replace: '&gt;' + +==== Going Further + +Honestly, this brief tutorial just provides you with the basics of Tartan. If you want to know more, for now, the best thing is to look at the Markdown[http://daringfireball.net/projects/markdown/] and table extension specification in the code. That will show you a real-world example of how to create a base parser and a mix-in. + +There will be additional documentation to follow. In particular a reference guide that covers all the parser rule directives one at a time. + +If you need some help in getting Tartan to work for your project, please don't hesitate to post to the Tartan help-form[http://rubyforge.org/forum/forum.php?forum_id=8042] or write me directly at mailto:bitherder@rubyforge.org. + == The Name Tartan is intended to weave together different parsing elements. It's intended -to be an alternative of both RedCloth[http:www.redcloth.org/] and BlueCloth. Tartan is a kind of cloth +to be an alternative of both RedCloth[http:www.redcloth.org/] and BlueCloth[http://www.deveiate.org/projects/BlueCloth]. Tartan is a kind of cloth that weaves different colors together in an interesting pattern. \ No newline at end of file