README.adoc in asciimath2unitsml-0.1.3 vs README.adoc in asciimath2unitsml-0.2.0

- old
+ new

@@ -2,28 +2,61 @@ Convert Units expressions via MathML to UnitsML This gem converts MathML incorporating UnitsML expressions (based on the Ascii representation provided by NIST) into MathML complying with https://www.w3.org/TR/mathml-units/[], with -UnitsML markup embedded in it, and with unique identifiers for each distinct unit and dimension. +UnitsML markup embedded in it, and with unique identifiers for each distinct unit, prefix, and dimension. +Dimensions are automatically inserted corresponding to each unit. Units expressions are identified in MathML as `<mtext>unitsml(...)</mtext>`, which in turn can be identified in AsciiMath as `"unitsml(...)"`. + The consuming document is meant to deduplicate the instances of UnitsML markup with the same identifier, and potentially remove them to elsewhere in the document or another document. +== Notation + +The `unitsml()` expression consists of a unit string. +The units used in `unitsml()` are taken from the UnitsDB database as updated by Ribose: +https://github.com/unitsml/unitsdb[]. Units are given as an ASCII based code, consisting of +multiplication or division of single units, each of which is defined as a Prefix +(taken from https://github.com/unitsml/unitsdb/blob/master/prefixes.yaml[]), +unit (taken from https://github.com/unitsml/unitsdb/blob/master/units.yaml[]), +and exponent; e.g. `mm*s^-2`. + The conventions used for writing units are: * `^` for exponents, e.g. `m^-2` -* `*` to combine two units by multiplication; e.g. `m*s^-2`. Division is not supported, use negative exponents instead +* `*` to combine two units by multiplication; e.g. `m*s^-2`. +* `/` to combine two units by division; * `u` for μ (micro-) +For more on units notation, see <<units_notation,Units Notation>>. + +The `unitsml()` can take additional optional parameters, giving further information for the UnitsML +to be generated: + +* `unitsml(unit-string, quantity: ID)` provides the UnitsDB identifier for the quantity being measured +(taken from https://github.com/unitsml/unitsdb/blob/master/quantities.yaml[]). For example, +`unitsml(s, quantity: NISTq109)` indicates that the second is used to measure period duration. +If a single quantity is associated with the unit in UnitsDB (as given in +https://github.com/unitsml/unitsdb/blob/master/units.yaml[]), that quantity is added automatically; +otherwise, no quantity is added unless explicitly nominated in this way. +* `unitsml(unit-string, name: NAME)` provides a name for the unit, if one is not already available +from UnitsDB. For example, `unitsml(cal_th/cm^2, name: langley)`. + +== Rendering + +The output of the gem is MathML, with MathML unit expressions (expressed as `<mi>`, +complying with https://www.w3.org/TR/mathml-units/[MathML Units]) cross-referenced to UnitsML +definitions embedded in the MathML. + The gem follows the MathML Units convention of inserting a spacing invisible times operator (`<mo rspace='thickmathspace'>&#x2062;</mo>`) between any numbers (`<mn>`) and unit expressions in MathML, and representing units in MathML as non-italic variables (`<mi mathvariant='normal'>`). -So: +== Example [source] ---- 9 "unitsml(C^3*A)" ---- @@ -81,23 +114,131 @@ </mrow> </math> ---- +== Usage + The converter is run as: [source,ruby] ---- c = Asciimath2UnitsML::Conv.new() -c.Asciimath2UnitsML({Asciimath string containing "unitsml()"}) -c.MathML2UnitsML({MathML document containing <mtext>unitsml()</mtext>}) -c.MathML2UnitsML({Nokogiri parse of MathML document containing <mtext>unitsml()</mtext>}) +c.Asciimath2UnitsML(1 "unitsml(mm*s^-2)") # AsciiMath string containing UnitsML +c.MathML2UnitsML("<math xmlns='http://www.w3.org/1998/Math/MathML'><mn>7</mn>"\ + "<mtext>unitsml(kg^-2)</mtext></math>") # AsciiMath string containing <mtext>unitsml()</mtext> +c.MathML2UnitsML(Nokogiri::XML("<math xmlns='http://www.w3.org/1998/Math/MathML'><mn>7</mn>"\ + "<mtext>unitsml(kg^-2)</mtext></math>")) # Nokogiri parse of MathML document containing <mtext>unitsml()</mtext> ---- The converter class may be initialised with options: * `multiplier` is the symbol used to represent the multiplication of units. By default, following MathML Units, the symbol is middle dot (`&#xB7`). An arbitrary UTF-8 string can be supplied instead; it will be encoded as XML entities. The value `:space` is rendered as a spacing invisible times in MathML (`<mo rspace='thickmathspace'>&#x2062;</mo>`), and as a non-breaking space in HTML. The value `:nospace` is rendered as a non-spacing invisible times in MathML (`<mo>&#x2062;</mo>`), and is not rendered in HTML. + +[[units_notation]] +== Units Notation + +The units used in `unitsml()` are taken from the UnitsDB database as updated by Ribose: +https://github.com/unitsml/unitsdb[]. Units are given as an ASCII based code, consisting of +multiplication or division of single units, each of which is defined as a Prefix +(taken from https://github.com/unitsml/unitsdb/blob/master/prefixes.yaml[]), +unit (taken from https://github.com/unitsml/unitsdb/blob/master/units.yaml[]), +and exponent; e.g. `mm*s^-2`. + +In case of ambiguity, the interpretation with no prefix is prioritised over the interpretation +as a unit; so `ct` is interpreted as _hundredweight_, rather than _centi-ton_. Exceptionally, +`kg` is decomposed into kilo-gram rather than treated as a basic unit, for consistency with +other prefixes of grams. (Prefixed units appear in UnitsDB, and are indicated as `prefixed: true`.) + +A unit may have multiple symbols; these are registered separately in +https://github.com/unitsml/unitsdb/units.yaml[units.yaml], as entries under `unit_symbols`. +These different symbols will be recognised as the same Unit in the UnitsML markup, but +the original symbol will be retained in the MathML expression. So an expression like `1 unitsml(mL)` +will be recognised as referring to microlitres; the expression will be given under its canonical +rendering `ml` in UnitsML markup, but the MathML rendering referencing that UnitsML expression +will keep the notation `mL`. + +The symbols used for units can be highly ambiguous; in order to guarantee accurate parsing, +the symbols used to data enter units are unambiguous in https://github.com/unitsml/unitsdb/units.yaml[units.yaml]. +They may be found as the entries for `unit_symbols/id` under each unit. For example, `B` is ambiguous between +_bel_ (as in decibel) and _byte_; they are kept unambiguous by using `bel_B` and `byte_B` to refer to them, +although they will still both be rendered as `B`. + +The following table is the current list of ambiguous symbols, which are disambiguated in the symbol ids used. +This table can be generated (in Asciidoc format) through `Asciimath2UnitsML::Conv.new().ambig_units`: + +[cols="7*"] +|=== +|Symbol | Unit + ID | | | | | + + +| &#8242; | minute (minute of arc): `'` | foot: `'_ft` | minute: `'_min` | minute (minute of arc): `prime` | foot: `prime_ft` | minute: `prime_min` +| &#8243; | second (second of arc): `"` | second: `"_s` | inch: `"_in` | second (second of arc): `dprime` | second: `dprime_s` | inch: `dprime_in` +| &#8243;Hg | conventional inch of mercury: `"Hg` | conventional inch of mercury: `dprime_Hg` | inch of mercury (32 degF): `"Hg_32degF` | inch of mercury (60 degF): `"Hg_60degF` | inch of mercury (32 degF): `dprime_Hg_32degF` | inch of mercury (60 degF): `dprime_Hg_60degF` +| hp | horsepower: `hp` | horsepower (UK): `hp_UK` | horsepower, water: `hp_water` | horsepower, metric: `hp_metric` | horsepower, boiler: `hp_boiler` | horsepower, electric: `hp_electric` +| Btu | British thermal unit_IT: `Btu` | British thermal unit (mean): `Btu_mean` | British thermal unit (39 degF): `Btu_39degF` | British thermal unit (59 degF): `Btu_59degF` | British thermal unit (60 degF): `Btu_60degF` | +| a | are: `a` | year (365 days): `a_year` | year, tropical: `a_tropical_year` | year, sidereal: `a_sidereal_year` | | +| d | day: `d` | darcy: `darcy` | day, sidereal: `d_sidereal` | | | +| inHg | conventional inch of mercury: `inHg` | inch of mercury (32 degF): `inHg_32degF` | inch of mercury (60 degF): `inHg_60degF` | | | +| inH~2~O | conventional inch of water: `inH_2O` | inch of water (39.2 degF): `inH_2O_39degF` | inch of water (60 degF): `inH_2O_60degF` | | | +| min | minute: `min` | minim: `minim` | minute, sidereal: `min_sidereal` | | | +| pc | parsec: `pc` | pica (printer's): `pica_printer` | pica (computer): `pica_computer` | | | +| t | metric ton: `t` | long ton: `ton_long` | short ton: `ton_short` | | | +| B | bel: `bel_B` | byte: `byte_B` | | | | +| cmHg | conventional centimeter of mercury: `cmHg` | centimeter of mercury (0 degC): `cmHg_0degC` | | | | +| cmH~2~O | conventional centimeter of water: `cmH_2O` | centimeter of water (4 degC): `cmH_2O_4degC` | | | | +| cup | cup (US): `cup` | cup (FDA): `cup_label` | | | | +| D | debye: `D` | darcy: `Darcy` | | | | +| ft | foot: `ft` | foot (based on US survey foot): `ft_US_survey` | | | | +| ftH~2~O | conventional foot of water: `ftH_2O` | foot of water (39.2 degF): `ftH_2O_39degF` | | | | +| gi | gill (US): `gi` | gill [Canadian and UK (Imperial)]: `gi_imperial` | | | | +| h | hour: `h` | hour, sidereal: `h_sidereal` | | | | +| &#8242;Hg | conventional foot of mercury: `'Hg` | conventional foot of mercury: `prime_Hg` | | | | +| __&#295;__ | natural unit of action: `h-bar` | atomic unit of action: `h-bar_atomic` | | | | +| __m__~e~ | natural unit of mass: `m_e` | atomic unit of mass: `m_e_atomic` | | | | +| in | inch: `in` | inch (based on US survey foot): `in_US_survey` | | | | +| K | kelvin: `K` | kayser: `kayser` | | | | +| L | liter: `L` | lambert: `Lambert` | | | | +| lb | pound (avoirdupois): `lb` | pound (troy or apothecary): `lb_troy` | | | | +| mi | mile: `mi` | mile (based on US survey foot): `mi_US_survey` | | | | +| mil | mil (length): `mil` | angular mil (NATO): `mil_nato` | | | | +| oz | ounce (avoirdupois): `oz` | ounce (troy or apothecary): `oz_troy` | | | | +| pt | point (printer's): `pt_printer` | point (computer): `pt_computer` | | | | +| rad | radian: `rad` | rad (absorbed dose): `rad_radiation` | | | | +| s | second: `s` | second, sidereal: `s_sidereal` | | | | +| tbsp | tablespoon: `tbsp` | tablespoon (FDA): `tbsp_label` | | | | +| ton | ton of TNT (energy equivalent): `ton_TNT` | ton of refrigeration (12 000 Btu_IT/h): `ton_refrigeration` | | | | +| tsp | teaspoon: `tsp` | teaspoon (FDA): `tsp_label` | | | | +| yd | yard: `yd` | yard (based on US survey foot): `yd_US_survey` | | | | +| &#186; | degree (degree of arc): `deg` | | | | | +| &#947; | gamma: `gamma` | | | | | +| &#956; | micron: `micron` | | | | | +| &#8486; | ohm: `Ohm` | | | | | +| &#197; | angstrom: `Aring` | | | | | +| &#295; | natural unit of action in eV s: `h-bar_eV_s` | | | | | +| ab&#937; | abohm: `abohm` | | | | | +| (ab&#937;)^-1^ | abmho: `abS` | | | | | +| aW | abwatt: `aW (Cardelli)` | | | | | +| b | barn: `barn` | | | | | +| Btu~th~ | British thermal unit_th: `Btu_th` | | | | | +| &#176;C | degree Celsius: `degC` | | | | | +| cal~IT~ | I.T. calorie: `cal_IT` | | | | | +| cal~th~ | thermochemical calorie: `cal_th` | | | | | +| &#176;F | degree Fahrenheit: `degF` | | | | | +| __a__~0~ | atomic unit of length: `a_0` | | | | | +| __c__ | natural unit of velocity: `c` | | | | | +| __c__~0~ | natural unit of velocity: `c_0` | | | | | +| __e__ | atomic unit of charge: `e` | | | | | +| __E__~h~ | atomic unit of energy: `e_h` | | | | | +| &#956;in | microinch: `uin` | | | | | +| &#176;K | kelvin: `degK` | | | | | +| kcal~IT~ | kilocalorie_IT: `kcal_IT` | | | | | +| kcal~th~ | kilocalorie_th: `kcal_th` | | | | | +| mmH~2~O | conventional millimeter of water: `mmH_2O` | | | | | +| &#176;R | degree Rankine: `degR` | | | | | +| &#x19b;~C~ | natural unit of length: `lambda-bar_C` | | | | | +|===