xml:Proof
<a schema for the rest of us/>
v.02.06.10 Beta
 Thomas Sawyer (c)2002


Specification

  1. Prologue
    1. General comprehension of the W3C XML, Namespace, and XPath Recommendations, and the Regular Expression Specification (see 10.1) is presumed by this document.
    2. xml:Proof is an XML schema. It was desgined to be easy to use and to cover a vast portion of the XML schematic problem set.
    3. A proofsheet is a valid XML document conforming to the xml:Proof specification.
    4. A target document is a XML document to which a proofsheet is intended to be applied.
    5. A proof is a parsed ordered set of proofsheets used to validate a target document.
    6. A proof-processor is a program able to parse proofsheets and validate XML documents against such proofsheets. The term processor, when unqualified, shall refer to this special case, proof-processor, in contrast to the more general case, XML processor, throughout this document.
    7. A symbol or symbolic name is a string of characters, matching against the regular expression /\w*/.
    8. For the purposes of this specification, a tag will be the symbolic name of an XML element or attribute. Element tags will be notated as <tagname> and attrtibute tags will be notated as tagname=
  2. Special Tags
    1. Special tags are proofsheet tags defined by the xml:Proof specification, in contrast to general tags which instead derive from a target document.
    2. The special root tag of a proofsheet is <proofsheet>. The root tag can take the alternate form of <schema>. Both forms of the root tag serve the exact same purpose.
    3. The <arbit> tag is a special xml:Proof tag used to indicate arbitrary location within the target document. It has single valid attribute, xpath=, which specifies the valid XPath to be matched against in the target document.
    4. Both the root tag and the arbit tag, and its xpath attribute tag, must be prefixed in reference to the xml:Proof namespace (3.5). While any arbitrary, but valid, prefix can be used to accomplish this, it is recommended that you use xp: for consistancy and clearity.
    5. All the general tags in a proofsheet are the same as those of the target document's it intends to model. The hiearchy of those elements are also the same.
  3. Die
    1. A Die is a syntatical contruction which defines contstraints on a target document.
    2. The sole text node of any proofsheet element and the value of any proofsheet attribute, with expection to the special xpath= attribute, is a die.
    3. A die may also be refered to as a cast and the act of writing or applying them, casting.
    4. A die consists of an unordered list of markers seperated by whitespace.
  4. Markers
    1. Name Marker
      1. (syntax) =name=
      2. A name marker is a symbol, enclosed by equal signs, which identifies the die such that it can be reused elsewhere in a the proofsheet.
      3. Name markers provide a convenient means of die reuse.
    2. Regular Expression Marker
      1. (syntax) /regular expression/
      2. A regular expression marker is a syntatical structure conforming to the Regular Expression sepcification. (see 10.1.4)
      3. A regular expression marker dictates that the content of an element or attribute of the target document must match against it.
      4. If no regular expression marker is present in a die, the die's regular expression effectively defaults to /.*/
    3. Datatype Marker
      1. (syntax) :datatype:
      2. The datatype marker is an arbitrary symbol, enclosed by colons, naming the type of data to be contanied by an element or attribute of the target document.
      3. The xml:Proof specification does not dictate the selection of datatypes, this task is instead relinquished to the processor.
      4. A datatype marker dictates that the content of an element or attribute of the target document must conform to it.
      5. Datatype markers allow an xml:Proof processor to typecast XML content into its underlying language of implementation.
      6. An sufficiant xml:Proof processor should provide a means to add and alter its internal datatypes.
      7. Any datatype not recognize by the xml:Proof processor shall be considered a string.
    4. Order Marker
      1. (syntax) @order@
      2. The order marker is a symbol enclosed in at-signs, which specifies the sort order of an element's child elements.
      3. Valid values for order are tag, content-a..z, content-z..a and none.
      4. The tag value specifies that the child elements must be in the order as given within the proofsheet.
      5. The content-a..z and content-z..a values specify that the child element's must appear in alphanumerical sequence, descending and ascending, respectively, by their first text node.
      6. The none value specifies that the child elements need not appear in any particular order, and is the default setting if no order marker is specified within a die.
      7. The order marker does not specify that each of the child elements must occur, or that one and only one of each said children must appear. It only specifies that, should they appear, they do so in the given order.
      8. The order marker is only applicable to an element, not an attribute, and the element must have child elements.
    5. Set Marker
      1. (syntax) +set+
      2. The set marker is a symbol, enclosed in addition signs, which specifies the ... of an element's child elements,
      3. Valid values for set are inclusive, exclusive and none.
      4. The inclusive value indicates that all the children elements must be present as given by the proofsheet, but other elements may appear along with them.
      5. The exclusive value indicates that all the children elements must be present as given by the proofsheet, and that no other elements may appear along with them.
      6. The value none indicates no requirments for the appearnece of child elements, and is the default if no set marker is specified in the die.
    6. Range Marker
      1. (syntax) #range#
      2. The range marker is a symbol, enclosed by pound signs, which specifies the minumum and maximum number of a given element or attribute that may appear within the target document.
      3. For elements, a valid range can be m..n or m...n, inclusive and exclusive of n, respectively, where m and n are unsigned integers and m < n, such thah m is the minimum number and n is the maximum number.
      4. An element may also a range marker of the form,m..*, equivalant to m...* specifying a minimum number (m) and an unbound maximum number.
      5. The default range marker for an element, if none is specified within the die, is 0..*.
      6. For attributes, a valid range can only be 0..1 or 1..1.
      7. The default range marker for an attribute, if none is specified within the die, is 0..1.
    7. Option Marker
      1. (syntax) ?option?
      2. The option marker is an arbitrary symbol, or unordered list of symbols seperated by commas, enclosed by question marks, which specifies the element or attribute belongs to a group of simularly marked elements and attributes, such that one and only one of such elements or attributes may appear within the target document.
      3. Elements and/or attributes partaking of an identical option do not need to belong to the same parent, although this can create a contridiction should an ancestor and one of its children partake of the same option group, rendering a document invalid by definition.
    8. Collection Marker
      1. (syntax) !collection!
      2. A collection marker is an arbitrary symbol, enclosed by exlimation marks, which specifies the element or attribute belongs to a group of simularly marked elements and attributes, such that all of the elements and/or attributes sharing the same collection marker must appear together within the target document.
      3. Any given element or attribute can only belong to a single collection group.
    9. Track Marker
      1. (syntax) *track*
      2. The track marker, which is a boolean symbol enclosed by asterisks, is a special marker which does not dictate structure or content. Rather it has a special purpose for XML datastores, specifying that the element or attribute should be specifically indexed.
      3. Valid boolean symbols for track are yes, no, true, or false, with the negative notations being the default.
      4. The tracking of particular XML elements in a datastore allows for fast search and retirieval, and fast aggregate functions to be applied to their values.
  5. File Extension and Namespace
    1. The file extension for a proofsheet is .xps.
    2. xml:Proof is fully namespace aware, both in functionality and in application to an XML Document. Since namespace prefixes serve as mere proxies to actual namespaces, any arbitrary prefix can be used, but the namespace itself, i.e. the uri, must be unique and persistent.
    3. The xml:Proof namespace shall be http://www.transami.net/namespace/xmlproof.
    4. Within a proofsheet, the namespace of all of xml:Proof's special elements and attributes must belong to the xml:Proof namespace.
    5. Within a proofsheet, all general xml:Proof elements and attributes must partake of the same namespace as their counterparts within the target document.
  6. Schema Declerations
    1. A proof-processor will recognize schema declarations made via XML processing instructions within the target document.
    2. (Syntax) <?xml:schema uri="uri" url="url" segment="segment"?>
    3. The uri attribute, or its synonym space, defines the kind of schema that is being utilized. This is the specific namespace uri as defined by the schema's designers. In the case of xml:Proof, it is "http://www.transami.net/namespace/xmlproof". It would be another string for, say, RELAX-NG or Schematron.
    4. The url attribute, or its synonym source is a path to the .xps file. The url can be a local path. The url is neccessary since proofsheets cannot be embedded in the target document like DTDs can.
    5. The segment attribute, or its synonym fragment is an optional attribute specifying an XPath which selects only a portion of the .xps file to use as the proofsheet.
    6. Interestingly, more than one schema can be declared within a given target document. In so doing, schema declarations appearing earlier within the document have precedence over those appering later. This allows for a means of cast overiding.
    7. Note that one W3C reccomendation has been minorly violated by this schema declaration notation with the reserved use of an instruction name matching /^xml/i.
  7. Namespace Declarations
    1. This xml:Proof specification offers a variant notation for namespace declarations, differing from the W3C recommendation. The W3C's recommendation is here considered somewhat nebulous and clumsy, and further, clutters and obscures the information of relevance within an XML document.
    2. A proof-processor will recognize namespace declarations made via XML processing instructions within the target document.
    3. (Syntax) <?xml:ns prefix="prefix" uri="uri"?>
    4. The prefix and uri attribute tags can also be labeled name and space, respectively.
    5. This specification recommends this use namespace declarations via document level processing instructions, instead of within general element tags as recommended by the W3C.
    6. This notation can coexist with the standrard notation because, in effect, all the namespace processing instruction specifies is insertion of a document level ATTRLIST for the namespaces thus defined.
    7.     <!DocType docname [
            <!ATTLIST docname xmlns:prefix 'uri' CDATA>
          ]>
        
    8. Obviously, many XML processors do not support this processing instruction. It is hoped that they will adopt this improved notation over time as it is a very simple and useful addition.
    9. A proof-processor will provide the means to convert between this notation and the standard notation.
  8. Functionality
    1. A proof-processor validates a target document by matching namespaces and XPaths between the proofsheet and the target document, such that all target document elements and attributes are validated againt their corresponding proofsheet's dies.
    2. Any possible absolute XPath within a proofsheet should only be accounted for once. If this is not adhered to it is not likely to cause a error. The proof-processor should only match against the first occurance of an absolute die within the proofsheet.
    3. The special <arbit> element overlaps in application with the general elements and attributes. In other words, a target document's element or attribute must conform to both an artbitrary die and a general die should both be applicable.
    4. The special <arbit> element overlaps in application with other arbitrary assignments. In other words, a target document's element or attribute must conform to all applicable artbitrary die.
  9. Appendix
    1. References
      1. W3C XML Recommendation
      2. W3C Namespacee Recommendation
      3. W3C XPath Recommendation
      4. Regular Expressions Specification