[![Build Status](https://travis-ci.org/datacite/bolognese.svg?branch=master)](https://travis-ci.org/datacite/bolognese)
[![Code Climate](https://codeclimate.com/github/datacite/bolognese/badges/gpa.svg)](https://codeclimate.com/github/datacite/bolognese)
[![Test Coverage](https://codeclimate.com/github/datacite/bolognese/badges/coverage.svg)](https://codeclimate.com/github/datacite/bolognese/coverage)
# Bolognese
Ruby gem and command-line utility for conversion of DOI metadata from and to [schema.org](https://schema.org) in JSON-LD.
## Features
* convert [Crossref XML](https://support.crossref.org/hc/en-us/articles/214936283-UNIXREF-query-output-format) to schema.org/JSON-LD
* convert [DataCite XML](http://schema.datacite.org/) to schema.org/JSON-LD
* fetch schema.org/JSON-LD from a URL
* convert schema.org/JSON-LD to [DataCite XML](http://schema.datacite.org/)
* convert Crossref XML to [DataCite XML](http://schema.datacite.org/)
Conversion to Crossref XML is not yet supported.
## Installation
The usual way with Bundler: add the following to your `Gemfile` to install the
current version of the gem:
```ruby
gem 'bolognese'
```
Then run `bundle install` to install into your environment.
You can also install the gem system-wide in the usual way:
```bash
gem install bolognese
```
## Commands
The `bolognese` commands understand URLs and DOIs as arguments. The `--as` command
line flag sets the format, either `crossref`, `datacite`, or `schema_org` (default).
## Examples
Read Crossref XML:
```
bolognese read https://doi.org/10.7554/elife.01567 --as crossref
eLife
2050-084X
02
11
2014
3
Automated quantitative histology reveals vascular morphodynamics during Arabidopsis hypocotyl secondary growth
Martial
Sankar
Kaisa
Nieminen
Laura
Ragni
Ioannis
Xenarios
Christian S
Hardtke
02
11
2014
10.7554/eLife.01567
1
eLifesciences
www.elifesciences.org
false
2013-09-20
2013-12-24
2014-02-11
SystemsX
EMBO
http://dx.doi.org/10.13039/501100003043
Swiss National Science Foundation
http://dx.doi.org/10.13039/501100001711
University of Lausanne
http://dx.doi.org/10.13039/501100006390
http://creativecommons.org/licenses/by/3.0/
http://creativecommons.org/licenses/by/3.0/
http://creativecommons.org/licenses/by/3.0/
10.7554/eLife.01567
http://elifesciences.org/lookup/doi/10.7554/eLife.01567
Nature
Bonke
426
181
2003
10.1038/nature02100
Genetics
Brenner
182
413
2009
10.1534/genetics.109.104976
Physiologia Plantarum
Chaffey
114
594
2002
10.1034/j.1399-3054.2002.1140413.x
Neural computation
Chang
13
2119
2001
10.1162/089976601750399335
Machine Learning
Cortes
20
273
1995
Development
Dolan
119
71
1993
Seminars in Cell & Developmental Biology
Elo
20
1097
2009
10.1016/j.semcdb.2009.09.009
Development
Etchells
140
2224
2013
10.1242/dev.091314
PLOS Genetics
Etchells
8
e1002997
2012
10.1371/journal.pgen.1002997
Molecular Systems Biology
Fuchs
6
370
2010
10.1038/msb.2010.25
Bio Systems
Granqvist
110
60
2012
10.1016/j.biosystems.2012.07.004
Current Opinion in Plant Biology
Groover
9
55
2006
10.1016/j.pbi.2005.11.013
Plant Cell
Hirakawa
22
2618
2010
10.1105/tpc.110.076083
Proceedings of the National Academy of Sciences of the United States of America
Hirakawa
105
15208
2008
10.1073/pnas.0808444105
Cell
Meyerowitz
56
263
1989
10.1016/0092-8674(89)90900-8
Science
Meyerowitz
295
1482
2002
10.1126/science.1066609
Plant Physiol
Nieminen
135
653
2004
10.1104/pp.104.040212
Nature Biotechnology
Noble
24
1565
2006
10.1038/nbt1206-1565
Proceedings of the National Academy of Sciences of the United States of America
Olson
77
1516
1980
10.1073/pnas.77.3.1516
Bioinformatics
Pau
26
979
2010
10.1093/bioinformatics/btq046
Plant Cell
Ragni
23
1322
2011
10.1105/tpc.111.084020
Sankar
2014
10.5061/dryad.b835k
Current Biology
Sibout
18
458
2008
10.1016/j.cub.2008.02.070
The New Phytologist
Spicer
186
577
2010
10.1111/j.1469-8137.2010.03236.x
Machine Vision and Applications
Theriault
23
659
2012
10.1007/s00138-011-0345-9
Cell
Uyttewaal
149
439
2012
10.1016/j.cell.2012.02.048
Nature Cell Biology
Yin
15
860
2013
10.1038/ncb2764
Abstract
10.7554/eLife.01567.001
http://elifesciences.org/lookup/doi/10.7554/eLife.01567.001
eLife digest
10.7554/eLife.01567.002
http://elifesciences.org/lookup/doi/10.7554/eLife.01567.002
Figure 1. Cellular level analysis of Arabidopsis hypocotyl secondary growth.
(
A
) Light microscopy of cross sections obtained from Arabidopsis hypocotyls (organ position illustrated for a 9-day-old seedling, lower left) at 9 dag (upper left) and 35 dag (right). Size bars are 100 μm. Blue GUS staining due to the presence of an
APL::GUS
reporter gene in this Col-0 background line marks phloem bundles. (
B
) Overview of the developmental series (time points and distinct samples per genotype) analyzed in this study. (
C
) Example of a high-resolution hypocotyl section image assembled from 11 × 11 tiles. (
D
) The same image after pre-processing and binarization, and (
E
) subsequent segmentation using a watershed algorithm. (
F
) Number of mis-segmented cells as determined by careful visual inspection in 12 sections, plotted against the total number of cells per section (log scale).
10.7554/eLife.01567.003
http://elifesciences.org/lookup/doi/10.7554/eLife.01567.003
Figure 2. The ‘Quantitative Histology’ approach.
(
A
) Overview of the computational pipeline from image acquisition to analysis. (
B
) ‘Phenoprints’ for the different genotypes and developmental stages.
10.7554/eLife.01567.004
http://elifesciences.org/lookup/doi/10.7554/eLife.01567.004
Figure 2—figure supplement 1. An example of classifier selection through V-fold cross validation.
The green arrow points out the selected feature combination according to the criteria of minimum number of features with the highest performance and the lowest variation (the radiusV feature was excluded due to its putative variation in tissue location).
10.7554/eLife.01567.005
http://elifesciences.org/lookup/doi/10.7554/eLife.01567.005
Figure 3. Progression of tissue proliferation.
(
A
) Principal component analysis (PCA) of the phenoprints shown in Figure 2B, performed with normalized values (Supplementary file 4). The inlay screeplot displays the proportion of total variation explained by each principal component. (
B–E
) Comparative plots of parameter progression in the two genotypes. In (
D
), xylem represents combined vessel, parenchyma, and fiber cells, phloem represents combined phloem parenchyma and bundle cells. Error bars indicate standard error.
10.7554/eLife.01567.006
http://elifesciences.org/lookup/doi/10.7554/eLife.01567.006
Figure 4. Bimodal distribution of incline angle according to position.
(
A
and
B
) Spatial distribution of cell incline angle illustrates the vascular organization in Ler (
B
) as compared to Col-0 (
A
) at later stages of development, for example 30 dag. The size of the disc increases with the area of the cell. Blue color indicates radial cell orientation, red orthoradial. (
C
and
D
) Violin plots of incline angle distribution, illustrating increasingly bimodal distribution coincident with refined vascular organization and different dynamics of the process in the two genotypes.
10.7554/eLife.01567.007
http://elifesciences.org/lookup/doi/10.7554/eLife.01567.007
Figure 4—figure supplement 1. An illustration of the incline angle.
The incline is the angle between the section radius through the center of an ellipse fit to a cell and the major axis of that ellipse extended towards the x axis.
10.7554/eLife.01567.008
http://elifesciences.org/lookup/doi/10.7554/eLife.01567.008
Figure 5. Distinct local organization of incline angle during hypocotyl secondary growth progression.
(
A
–
J
) Density plots of cell incline angle vs radial position for the two genotypes at the indicated developmental stages, representing all cells across all sections for a given time point. The red lines represent the fit of these cloud distributions with locally weighted linear regression (i.e., lowess), revealing the essential data trends. All sections were normalized from 0.0 (the manually defined center) to 1.0 (the average radius in a set of sections as determined by the average distance of the outermost cells from the center for individual sections). Box plots indicate the quartiles of the radian distribution for each cell-type class and are placed at the average position of the cell type with respect to the y axis. Outliers are shown as circles.
10.7554/eLife.01567.009
http://elifesciences.org/lookup/doi/10.7554/eLife.01567.009
Figure 5—figure supplement 1. Analysis of cell number in defined xylem regions of different size.
Cell number in a circle of 200–500 pixels around the section centers for Col-0. Cell count in a constant area of xylem over time across all averaged across all sections.
10.7554/eLife.01567.010
http://elifesciences.org/lookup/doi/10.7554/eLife.01567.010
Figure 6. Mapping of phloem pole patterning.
(
A
) Example of Gaussian kernel density estimate of the location of predicted phloem bundles cells in a 30 dag Col-0 section. High density represents phloem poles. (
B
) Example of an analysis of emerging phloem pole position in a 30 dag Col-0 section. The plot represents a pixel intensity map after noise reduction along a circular region of interest across the emerging phloem poles. Intensity peaks are due to GUS staining conferred to phloem bundles by an
APL::GUS
reporter construct. (
C
) Probability density function of the data shown in (
B
) obtained from an automated Bayesian model. The dominant single peak indicates a constant arc distance of ca. 62 pixel between the phloem poles.
10.7554/eLife.01567.011
http://elifesciences.org/lookup/doi/10.7554/eLife.01567.011
Supplementary file 1. (
A
) An explanation of the extracted parameters that describe the cellular features. (
B
) Summary information of the hand-labeled training set for supervised machine learning. (
C
) Definition of the classifiers selected for analysis. (
D
) Summary of the classifier parameters for supervised machine learning. (
E
) Overview of the cell type classes recognized by the supervised machine learning approach and their assignment codes used in Data Files 3 and 4.
10.7554/eLife.01567.012
http://elifesciences.org/lookup/doi/10.7554/eLife.01567.012
Supplementary file 2. Quality control files for the Col-0 sections.
10.7554/eLife.01567.013
http://elifesciences.org/lookup/doi/10.7554/eLife.01567.013
Supplementary file 3. Quality control files for the Ler sections.
10.7554/eLife.01567.014
http://elifesciences.org/lookup/doi/10.7554/eLife.01567.014
Supplementary file 4. The normalized values of the phenoprints (Figure 2B) used for PCA.
10.7554/eLife.01567.015
http://elifesciences.org/lookup/doi/10.7554/eLife.01567.015
Decision letter
10.7554/eLife.01567.016
http://elifesciences.org/lookup/doi/10.7554/eLife.01567.016
Author response
10.7554/eLife.01567.017
http://elifesciences.org/lookup/doi/10.7554/eLife.01567.017
```
Convert Crossref XML to schema.org/JSON-LD:
```
bolognese read https://doi.org/10.7554/elife.01567
{
"@context": "http://schema.org",
"@type": "ScholarlyArticle",
"@id": "https://doi.org/10.7554/elife.01567",
"url": "http://elifesciences.org/lookup/doi/10.7554/eLife.01567",
"additionalType": "JournalArticle",
"name": "Automated quantitative histology reveals vascular morphodynamics during Arabidopsis hypocotyl secondary growth",
"author": [{
"@type": "Person",
"givenName": "Martial",
"familyName": "Sankar"
}, {
"@type": "Person",
"givenName": "Kaisa",
"familyName": "Nieminen"
}, {
"@type": "Person",
"givenName": "Laura",
"familyName": "Ragni"
}, {
"@type": "Person",
"givenName": "Ioannis",
"familyName": "Xenarios"
}, {
"@type": "Person",
"givenName": "Christian S",
"familyName": "Hardtke"
}],
"license": "http://creativecommons.org/licenses/by/3.0/",
"datePublished": "2014-02-11",
"dateModified": "2015-08-11T05:35:02Z",
"isPartOf": {
"@type": "Periodical",
"name": "eLife",
"issn": "2050-084X"
},
"citation": [{
"@type": "CreativeWork",
"@id": "https://doi.org/10.1038/nature02100",
"position": "1",
"datePublished": "2003"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1534/genetics.109.104976",
"position": "2",
"datePublished": "2009"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1034/j.1399-3054.2002.1140413.x",
"position": "3",
"datePublished": "2002"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1162/089976601750399335",
"position": "4",
"datePublished": "2001"
}, {
"@type": "CreativeWork",
"position": "5",
"datePublished": "1995"
}, {
"@type": "CreativeWork",
"position": "6",
"datePublished": "1993"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1016/j.semcdb.2009.09.009",
"position": "7",
"datePublished": "2009"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1242/dev.091314",
"position": "8",
"datePublished": "2013"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1371/journal.pgen.1002997",
"position": "9",
"datePublished": "2012"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1038/msb.2010.25",
"position": "10",
"datePublished": "2010"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1016/j.biosystems.2012.07.004",
"position": "11",
"datePublished": "2012"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1016/j.pbi.2005.11.013",
"position": "12",
"datePublished": "2006"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1105/tpc.110.076083",
"position": "13",
"datePublished": "2010"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1073/pnas.0808444105",
"position": "14",
"datePublished": "2008"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1016/0092-8674(89)90900-8",
"position": "15",
"datePublished": "1989"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1126/science.1066609",
"position": "16",
"datePublished": "2002"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1104/pp.104.040212",
"position": "17",
"datePublished": "2004"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1038/nbt1206-1565",
"position": "18",
"datePublished": "2006"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1073/pnas.77.3.1516",
"position": "19",
"datePublished": "1980"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1093/bioinformatics/btq046",
"position": "20",
"datePublished": "2010"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1105/tpc.111.084020",
"position": "21",
"datePublished": "2011"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.5061/dryad.b835k",
"position": "22",
"datePublished": "2014"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1016/j.cub.2008.02.070",
"position": "23",
"datePublished": "2008"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1111/j.1469-8137.2010.03236.x",
"position": "24",
"datePublished": "2010"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1007/s00138-011-0345-9",
"position": "25",
"datePublished": "2012"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1016/j.cell.2012.02.048",
"position": "26",
"datePublished": "2012"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.1038/ncb2764",
"position": "27",
"datePublished": "2013"
}],
"funder": [{
"@type": "Organization",
"name": "SystemsX"
}, {
"@type": "Organization",
"@id": "https://doi.org/10.13039/501100003043",
"name": "EMBO"
}, {
"@type": "Organization",
"@id": "https://doi.org/10.13039/501100001711",
"name": "Swiss National Science Foundation"
}, {
"@type": "Organization",
"@id": "https://doi.org/10.13039/501100006390",
"name": "University of Lausanne"
}],
"provider": {
"@type": "Organization",
"name": "Crossref"
}
}
```
Convert Crossref XML to DataCite XML:
```
bolognese read https://doi.org/10.7554/elife.01567 --as datacite
10.7554/eLife.01567
Sankar, Martial
Martial
Sankar
Nieminen, Kaisa
Kaisa
Nieminen
Ragni, Laura
Laura
Ragni
Xenarios, Ioannis
Ioannis
Xenarios
Hardtke, Christian S
Christian S
Hardtke
Automated quantitative histology reveals vascular morphodynamics during Arabidopsis hypocotyl secondary growth
eLife
2014
JournalArticle
SystemsX
EMBO
https://doi.org/10.13039/501100003043
Swiss National Science Foundation
https://doi.org/10.13039/501100001711
University of Lausanne
https://doi.org/10.13039/501100006390
2014-02-11
2015-08-11T05:35:02Z
https://doi.org/10.1038/nature02100
https://doi.org/10.1534/genetics.109.104976
https://doi.org/10.1034/j.1399-3054.2002.1140413.x
https://doi.org/10.1162/089976601750399335
https://doi.org/10.1016/j.semcdb.2009.09.009
https://doi.org/10.1242/dev.091314
https://doi.org/10.1371/journal.pgen.1002997
https://doi.org/10.1038/msb.2010.25
https://doi.org/10.1016/j.biosystems.2012.07.004
https://doi.org/10.1016/j.pbi.2005.11.013
https://doi.org/10.1105/tpc.110.076083
https://doi.org/10.1073/pnas.0808444105
https://doi.org/10.1016/0092-8674(89)90900-8
https://doi.org/10.1126/science.1066609
https://doi.org/10.1104/pp.104.040212
https://doi.org/10.1038/nbt1206-1565
https://doi.org/10.1073/pnas.77.3.1516
https://doi.org/10.1093/bioinformatics/btq046
https://doi.org/10.1105/tpc.111.084020
https://doi.org/10.5061/dryad.b835k
https://doi.org/10.1016/j.cub.2008.02.070
https://doi.org/10.1111/j.1469-8137.2010.03236.x
https://doi.org/10.1007/s00138-011-0345-9
https://doi.org/10.1016/j.cell.2012.02.048
https://doi.org/10.1038/ncb2764
Creative Commons Attribution 3.0 (CC-BY 3.0)
```
Read DataCite XML:
```
bolognese read 10.5061/DRYAD.8515 --as datacite
10.5061/DRYAD.8515
1
Ollomo, Benjamin
Durand, Patrick
Prugnolle, Franck
Douzery, Emmanuel J. P.
Arnathau, Céline
Nkoghe, Dieudonné
Leroy, Eric
Renaud, François
Data from: A new malaria agent in African hominids.
Dryad Digital Repository
2011
Phylogeny
Malaria
Parasites
Taxonomy
Mitochondrial genome
Africa
Plasmodium
DataPackage
Ollomo B, Durand P, Prugnolle F, Douzery EJP, Arnathau C, Nkoghe D, Leroy E, Renaud F (2009) A new malaria agent in African hominids. PLoS Pathogens 5(5): e1000446.
10.5061/DRYAD.8515/1
10.5061/DRYAD.8515/2
10.1371/JOURNAL.PPAT.1000446
19478877
```
Convert DataCite XML to schema.org/JSON-LD:
```sh
bolognese read 10.5061/DRYAD.8515
{
"@context": "http://schema.org",
"@type": "Dataset",
"@id": "https://doi.org/10.5061/dryad.8515",
"additionalType": "DataPackage",
"name": "Data from: A new malaria agent in African hominids.",
"alternateName": "Ollomo B, Durand P, Prugnolle F, Douzery EJP, Arnathau C, Nkoghe D, Leroy E, Renaud F (2009) A new malaria agent in African hominids. PLoS Pathogens 5(5): e1000446.",
"author": [{
"@type": "Person",
"givenName": "Benjamin",
"familyName": "Ollomo"
}, {
"@type": "Person",
"givenName": "Patrick",
"familyName": "Durand"
}, {
"@type": "Person",
"givenName": "Franck",
"familyName": "Prugnolle"
}, {
"@type": "Person",
"givenName": "Emmanuel J. P.",
"familyName": "Douzery"
}, {
"@type": "Person",
"givenName": "Céline",
"familyName": "Arnathau"
}, {
"@type": "Person",
"givenName": "Dieudonné",
"familyName": "Nkoghe"
}, {
"@type": "Person",
"givenName": "Eric",
"familyName": "Leroy"
}, {
"@type": "Person",
"givenName": "François",
"familyName": "Renaud"
}],
"license": "http://creativecommons.org/publicdomain/zero/1.0/",
"version": "1",
"keywords": "Phylogeny, Malaria, Parasites, Taxonomy, Mitochondrial genome, Africa, Plasmodium",
"datePublished": "2011",
"hasPart": [{
"@type": "CreativeWork",
"@id": "https://doi.org/10.5061/dryad.8515/1"
}, {
"@type": "CreativeWork",
"@id": "https://doi.org/10.5061/dryad.8515/2"
}],
"citation": [{
"@type": "CreativeWork",
"@id": "https://doi.org/10.1371/journal.ppat.1000446"
}],
"schemaVersion": "http://datacite.org/schema/kernel-3",
"publisher": {
"@type": "Organization",
"name": "Dryad Digital Repository"
},
"provider": {
"@type": "Organization",
"name": "DataCite"
}
}
```
Convert DataCite XML to schema version 4.0:
```
bolognese read 10.5061/DRYAD.8515 --as datacite --schema_version http://datacite.org/schema/kernel-4
10.5061/DRYAD.8515
Ollomo, Benjamin
Benjamin
Ollomo
Durand, Patrick
Patrick
Durand
Prugnolle, Franck
Franck
Prugnolle
Douzery, Emmanuel J. P.
Emmanuel J. P.
Douzery
Arnathau, Céline
Céline
Arnathau
Nkoghe, Dieudonné
Dieudonné
Nkoghe
Leroy, Eric
Eric
Leroy
Renaud, François
François
Renaud
Data from: A new malaria agent in African hominids.
Dryad Digital Repository
2011
DataPackage
Ollomo B, Durand P, Prugnolle F, Douzery EJP, Arnathau C, Nkoghe D, Leroy E, Renaud F (2009) A new malaria agent in African hominids. PLoS Pathogens 5(5): e1000446.
Phylogeny
Malaria
Parasites
Taxonomy
Mitochondrial genome
Africa
Plasmodium
2011
https://doi.org/10.5061/dryad.8515/1
https://doi.org/10.5061/dryad.8515/2
https://doi.org/10.1371/journal.ppat.1000446
1
Public Domain (CC0 1.0)
```
## Development
We use rspec for unit testing:
```
bundle exec rspec
```
Follow along via [Github Issues](https://github.com/datacite/bolognese/issues).
Please open an issue if conversion fails or metadata are not properly supported.
### Note on Patches/Pull Requests
* Fork the project
* Write tests for your new feature or a test that reproduces a bug
* Implement your feature or make a bug fix
* Do not mess with Rakefile, version or history
* Commit, push and make a pull request. Bonus points for topical branches.
## License
**bolognese** is released under the [MIT License](https://github.com/datacite/bolognese/blob/master/LICENSE.md).