Syntax for meta-data

This document describe a syntax that makes it possible to attach meta-data to block-level elements (headers, paragraphs, code blocks, ...), and to span-level elements (links, images, ...).

Last update: December 29th, 2006.

Table of contents:

1. Attribute lists

This is an example attribute list, which shows everything you can put inside:

{key1=val key2="long val" #myid .class1 .class2 tag1 tag2}

More in particular, an attribute list is a brace-enclosed, whitespace-separated list of elements of 4 different kinds:

  1. key/value pairs
  2. tags (tag1,tag2)
  3. id specifiers (#myid)
  4. class specifiers (.myclass)

The formal grammar is specified below.

1.1. id and class are special

You can attach every attribute you want to elements, but some are threated in a special way:

For ID and classes there are special shortcuts:

Therefore the following attribute lists are equivalent:

{#myid .class1 .class2} 
{id=myid class=class1 class=class2}
{id=myid class="class1 class2"}

2. Where to put attribute lists

2.1. For block-level elements

For paragraphs and other block-level elements, attributes lists go after the element:

This is a paragraph.
Line 2 of the paragraph.
{#myid .myclass}

A quote with a citation url:
> Who said that?
{cite=google.com}

Note: empty lines between the block and the attributes list are not tollerated. So this is not legal:

This is a paragraph.
Line 2 of the paragraph.

{#myid .myclass}

Attribute lists may be indented up to 3 spaces:

Paragraph1
¬{ok}

Paragraph2
¬¬{ok}

Paragraph2
¬¬¬{ok}

2.2. For headers

For headers, you can put attribute lists on the same line:

### Header ###     {#myid}

Header     {#myid .myclass}
------

or, as other block-level elements, on the line after:

### Header ###     
{#myid}

Header     
------
{#myid .myclass}

2.3. For span-level elements

For span-level elements, metadata goes immediately after in the paragraph flow.

For example, in this:

This is a *chunky paragraph*{#id1}.
{#id2}

the ID of the em element is set to id1 and the id of the paragraph is set to id2.

This works also for links, like this:

This is [a link][ref]{#myid rel=abc rev=abc}

For images, this:

This is ![Alt text](url "fresh carrots")

is equivalent to:

This is ![Alt text](url){title="fresh carrots"}

3. Using "tags"

In an attribute list, you can have:

  1. key=value pairs,
  2. id attributes (#myid)
  3. class attributes (.myclass)

Everything else is interpreted as a "tag" 1. Tags let you tag an element and then specify the attributes later:

# Header #      {tag}

Blah blah blah.

{tag}: #myhead .myclass lang=fr

Tags are not unique: more than one element can be assigned the same tag.

# Header 1 #      {tag}
...
# Header 2 #      {tag}

{tag}: .myclass lang=fr

In this case, however, you should not assign the id attribute. So this is not valid:

# Header 1 #      {tag}
...
# Header 2 #      {tag}

{tag}: #myid .myclass lang=fr

Of course, tags are valid for both block-level and span-level elements:

### My header ### {1}
This is a paragraph with an *emphasis*{2}
a and the paragraph goes on.
{3}

{1}: #header_id
{2}: #emph_id
{3}: #par_id

4. Additional examples and corner-cases

4.1. Code blocks

Note that attributes for code blocks should not be indented by more than 3 spaces:

¬¬¬¬This¬is¬a¬code¬block.
¬¬¬¬{#myid}¬<--¬this¬is¬part¬of¬the¬block
¬¬¬{#blockid}

5. Formal grammar

In this section we define the formal grammar AKA the big regexp.

In the spirit of HTML:

Identifiers must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens (-), underscores (_), colons (:), and periods (.).

the same applies to class attributes and for the keys in key/value pairs. Moreover, they are case-sensitive.

So this is a valid attribute list:

{#my:_A123.veryspecialID .my:____:class }

The regexp for identifiers is therefore

Identifier = [A-Za-z][A-Za-z0-9_\.\:\-]*

(This is Ruby syntax; I am told it is similar to Perl's so I guess it is generally understandable. If not, please tell me the equivalent in your language.)

Now:

5.1. Summary

To summarize:

AttributeList =  \{ (ws [KeyValue|IdSpec|ClassSpec|Tag])*  ws \}    
Identifier    =  [A-Za-z][A-Za-z0-9_\.\:\-]*
Tag           =  Identifier 
IdSpec        =  #Identifier 
ClassSpec     =  .Identifier 
KeyValue      =  Key=[QuotedValue|UnquotedValue]
Key           =  Identifier
UnquotedValue =  [^\s\"][^\s]*
QuotedValue   =  \"[^\"]*\"            <---------- note: simplistic

Note: I am not able to write the regexp for QuotedValue that takes into account also the escaping of the characters. Any regexp wizard out there?

6. Things to discuss


  1. a better name for this?


Created by Maruku at 17:00 on Friday, December 29th, 2006.