[![Build Status](https://travis-ci.org/datacite/bolognese.svg?branch=master)](https://travis-ci.org/datacite/bolognese) [![Code Climate](https://codeclimate.com/github/datacite/bolognese/badges/gpa.svg)](https://codeclimate.com/github/datacite/bolognese) [![Test Coverage](https://codeclimate.com/github/datacite/bolognese/badges/coverage.svg)](https://codeclimate.com/github/datacite/bolognese/coverage) # Bolognese Ruby gem and command-line utility for conversion of DOI metadata from and to [schema.org](https://schema.org) in JSON-LD. ## Features * convert [Crossref XML](https://support.crossref.org/hc/en-us/articles/214936283-UNIXREF-query-output-format) to schema.org/JSON-LD * convert [DataCite XML](http://schema.datacite.org/) to schema.org/JSON-LD * fetch schema.org/JSON-LD from a URL * convert schema.org/JSON-LD to [DataCite XML](http://schema.datacite.org/) * convert Crossref XML to [DataCite XML](http://schema.datacite.org/) Conversion to Crossref XML is not yet supported. ## Installation The usual way with Bundler: add the following to your `Gemfile` to install the current version of the gem: ```ruby gem 'bolognese' ``` Then run `bundle install` to install into your environment. You can also install the gem system-wide in the usual way: ```bash gem install bolognese ``` ## Commands The `bolognese` commands understand URLs and DOIs as arguments. The `--as` command line flag sets the format, either `crossref`, `datacite`, or `schema_org` (default). ## Examples Read Crossref XML: ``` bolognese read https://doi.org/10.7554/elife.01567 --as crossref eLife 2050-084X 02 11 2014 3 Automated quantitative histology reveals vascular morphodynamics during Arabidopsis hypocotyl secondary growth Martial Sankar Kaisa Nieminen Laura Ragni Ioannis Xenarios Christian S Hardtke 02 11 2014 10.7554/eLife.01567 1 eLifesciences www.elifesciences.org false 2013-09-20 2013-12-24 2014-02-11 SystemsX EMBO http://dx.doi.org/10.13039/501100003043 Swiss National Science Foundation http://dx.doi.org/10.13039/501100001711 University of Lausanne http://dx.doi.org/10.13039/501100006390 http://creativecommons.org/licenses/by/3.0/ http://creativecommons.org/licenses/by/3.0/ http://creativecommons.org/licenses/by/3.0/ 10.7554/eLife.01567 http://elifesciences.org/lookup/doi/10.7554/eLife.01567 Nature Bonke 426 181 2003 10.1038/nature02100 Genetics Brenner 182 413 2009 10.1534/genetics.109.104976 Physiologia Plantarum Chaffey 114 594 2002 10.1034/j.1399-3054.2002.1140413.x Neural computation Chang 13 2119 2001 10.1162/089976601750399335 Machine Learning Cortes 20 273 1995 Development Dolan 119 71 1993 Seminars in Cell & Developmental Biology Elo 20 1097 2009 10.1016/j.semcdb.2009.09.009 Development Etchells 140 2224 2013 10.1242/dev.091314 PLOS Genetics Etchells 8 e1002997 2012 10.1371/journal.pgen.1002997 Molecular Systems Biology Fuchs 6 370 2010 10.1038/msb.2010.25 Bio Systems Granqvist 110 60 2012 10.1016/j.biosystems.2012.07.004 Current Opinion in Plant Biology Groover 9 55 2006 10.1016/j.pbi.2005.11.013 Plant Cell Hirakawa 22 2618 2010 10.1105/tpc.110.076083 Proceedings of the National Academy of Sciences of the United States of America Hirakawa 105 15208 2008 10.1073/pnas.0808444105 Cell Meyerowitz 56 263 1989 10.1016/0092-8674(89)90900-8 Science Meyerowitz 295 1482 2002 10.1126/science.1066609 Plant Physiol Nieminen 135 653 2004 10.1104/pp.104.040212 Nature Biotechnology Noble 24 1565 2006 10.1038/nbt1206-1565 Proceedings of the National Academy of Sciences of the United States of America Olson 77 1516 1980 10.1073/pnas.77.3.1516 Bioinformatics Pau 26 979 2010 10.1093/bioinformatics/btq046 Plant Cell Ragni 23 1322 2011 10.1105/tpc.111.084020 Sankar 2014 10.5061/dryad.b835k Current Biology Sibout 18 458 2008 10.1016/j.cub.2008.02.070 The New Phytologist Spicer 186 577 2010 10.1111/j.1469-8137.2010.03236.x Machine Vision and Applications Theriault 23 659 2012 10.1007/s00138-011-0345-9 Cell Uyttewaal 149 439 2012 10.1016/j.cell.2012.02.048 Nature Cell Biology Yin 15 860 2013 10.1038/ncb2764 <b>Abstract</b> 10.7554/eLife.01567.001 http://elifesciences.org/lookup/doi/10.7554/eLife.01567.001 <b>eLife digest</b> 10.7554/eLife.01567.002 http://elifesciences.org/lookup/doi/10.7554/eLife.01567.002 Figure 1. Cellular level analysis of Arabidopsis hypocotyl secondary growth. ( A ) Light microscopy of cross sections obtained from Arabidopsis hypocotyls (organ position illustrated for a 9-day-old seedling, lower left) at 9 dag (upper left) and 35 dag (right). Size bars are 100 μm. Blue GUS staining due to the presence of an APL::GUS reporter gene in this Col-0 background line marks phloem bundles. ( B ) Overview of the developmental series (time points and distinct samples per genotype) analyzed in this study. ( C ) Example of a high-resolution hypocotyl section image assembled from 11 × 11 tiles. ( D ) The same image after pre-processing and binarization, and ( E ) subsequent segmentation using a watershed algorithm. ( F ) Number of mis-segmented cells as determined by careful visual inspection in 12 sections, plotted against the total number of cells per section (log scale). 10.7554/eLife.01567.003 http://elifesciences.org/lookup/doi/10.7554/eLife.01567.003 Figure 2. The ‘Quantitative Histology’ approach. ( A ) Overview of the computational pipeline from image acquisition to analysis. ( B ) ‘Phenoprints’ for the different genotypes and developmental stages. 10.7554/eLife.01567.004 http://elifesciences.org/lookup/doi/10.7554/eLife.01567.004 Figure 2—figure supplement 1. An example of classifier selection through V-fold cross validation. The green arrow points out the selected feature combination according to the criteria of minimum number of features with the highest performance and the lowest variation (the radiusV feature was excluded due to its putative variation in tissue location). 10.7554/eLife.01567.005 http://elifesciences.org/lookup/doi/10.7554/eLife.01567.005 Figure 3. Progression of tissue proliferation. ( A ) Principal component analysis (PCA) of the phenoprints shown in Figure 2B, performed with normalized values (Supplementary file 4). The inlay screeplot displays the proportion of total variation explained by each principal component. ( B–E ) Comparative plots of parameter progression in the two genotypes. In ( D ), xylem represents combined vessel, parenchyma, and fiber cells, phloem represents combined phloem parenchyma and bundle cells. Error bars indicate standard error. 10.7554/eLife.01567.006 http://elifesciences.org/lookup/doi/10.7554/eLife.01567.006 Figure 4. Bimodal distribution of incline angle according to position. ( A and B ) Spatial distribution of cell incline angle illustrates the vascular organization in Ler ( B ) as compared to Col-0 ( A ) at later stages of development, for example 30 dag. The size of the disc increases with the area of the cell. Blue color indicates radial cell orientation, red orthoradial. ( C and D ) Violin plots of incline angle distribution, illustrating increasingly bimodal distribution coincident with refined vascular organization and different dynamics of the process in the two genotypes. 10.7554/eLife.01567.007 http://elifesciences.org/lookup/doi/10.7554/eLife.01567.007 Figure 4—figure supplement 1. An illustration of the incline angle. The incline is the angle between the section radius through the center of an ellipse fit to a cell and the major axis of that ellipse extended towards the x axis. 10.7554/eLife.01567.008 http://elifesciences.org/lookup/doi/10.7554/eLife.01567.008 Figure 5. Distinct local organization of incline angle during hypocotyl secondary growth progression. ( AJ ) Density plots of cell incline angle vs radial position for the two genotypes at the indicated developmental stages, representing all cells across all sections for a given time point. The red lines represent the fit of these cloud distributions with locally weighted linear regression (i.e., lowess), revealing the essential data trends. All sections were normalized from 0.0 (the manually defined center) to 1.0 (the average radius in a set of sections as determined by the average distance of the outermost cells from the center for individual sections). Box plots indicate the quartiles of the radian distribution for each cell-type class and are placed at the average position of the cell type with respect to the y axis. Outliers are shown as circles. 10.7554/eLife.01567.009 http://elifesciences.org/lookup/doi/10.7554/eLife.01567.009 Figure 5—figure supplement 1. Analysis of cell number in defined xylem regions of different size. Cell number in a circle of 200–500 pixels around the section centers for Col-0. Cell count in a constant area of xylem over time across all averaged across all sections. 10.7554/eLife.01567.010 http://elifesciences.org/lookup/doi/10.7554/eLife.01567.010 Figure 6. Mapping of phloem pole patterning. ( A ) Example of Gaussian kernel density estimate of the location of predicted phloem bundles cells in a 30 dag Col-0 section. High density represents phloem poles. ( B ) Example of an analysis of emerging phloem pole position in a 30 dag Col-0 section. The plot represents a pixel intensity map after noise reduction along a circular region of interest across the emerging phloem poles. Intensity peaks are due to GUS staining conferred to phloem bundles by an APL::GUS reporter construct. ( C ) Probability density function of the data shown in ( B ) obtained from an automated Bayesian model. The dominant single peak indicates a constant arc distance of ca. 62 pixel between the phloem poles. 10.7554/eLife.01567.011 http://elifesciences.org/lookup/doi/10.7554/eLife.01567.011 Supplementary file 1. ( <b>A</b> ) An explanation of the extracted parameters that describe the cellular features. ( <b>B</b> ) Summary information of the hand-labeled training set for supervised machine learning. ( <b>C</b> ) Definition of the classifiers selected for analysis. ( <b>D</b> ) Summary of the classifier parameters for supervised machine learning. ( <b>E</b> ) Overview of the cell type classes recognized by the supervised machine learning approach and their assignment codes used in Data Files 3 and 4. 10.7554/eLife.01567.012 http://elifesciences.org/lookup/doi/10.7554/eLife.01567.012 Supplementary file 2. Quality control files for the Col-0 sections. 10.7554/eLife.01567.013 http://elifesciences.org/lookup/doi/10.7554/eLife.01567.013 Supplementary file 3. Quality control files for the Ler sections. 10.7554/eLife.01567.014 http://elifesciences.org/lookup/doi/10.7554/eLife.01567.014 Supplementary file 4. The normalized values of the phenoprints (Figure 2B) used for PCA. 10.7554/eLife.01567.015 http://elifesciences.org/lookup/doi/10.7554/eLife.01567.015 <b>Decision letter</b> 10.7554/eLife.01567.016 http://elifesciences.org/lookup/doi/10.7554/eLife.01567.016 <b>Author response</b> 10.7554/eLife.01567.017 http://elifesciences.org/lookup/doi/10.7554/eLife.01567.017 ``` Convert Crossref XML to schema.org/JSON-LD: ``` bolognese read https://doi.org/10.7554/elife.01567 { "@context": "http://schema.org", "@type": "ScholarlyArticle", "@id": "https://doi.org/10.7554/elife.01567", "url": "http://elifesciences.org/lookup/doi/10.7554/eLife.01567", "additionalType": "JournalArticle", "name": "Automated quantitative histology reveals vascular morphodynamics during Arabidopsis hypocotyl secondary growth", "author": [{ "@type": "Person", "givenName": "Martial", "familyName": "Sankar" }, { "@type": "Person", "givenName": "Kaisa", "familyName": "Nieminen" }, { "@type": "Person", "givenName": "Laura", "familyName": "Ragni" }, { "@type": "Person", "givenName": "Ioannis", "familyName": "Xenarios" }, { "@type": "Person", "givenName": "Christian S", "familyName": "Hardtke" }], "license": "http://creativecommons.org/licenses/by/3.0/", "datePublished": "2014-02-11", "dateModified": "2015-08-11T05:35:02Z", "isPartOf": { "@type": "Periodical", "name": "eLife", "issn": "2050-084X" }, "citation": [{ "@type": "CreativeWork", "@id": "https://doi.org/10.1038/nature02100", "position": "1", "datePublished": "2003" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1534/genetics.109.104976", "position": "2", "datePublished": "2009" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1034/j.1399-3054.2002.1140413.x", "position": "3", "datePublished": "2002" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1162/089976601750399335", "position": "4", "datePublished": "2001" }, { "@type": "CreativeWork", "position": "5", "datePublished": "1995" }, { "@type": "CreativeWork", "position": "6", "datePublished": "1993" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1016/j.semcdb.2009.09.009", "position": "7", "datePublished": "2009" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1242/dev.091314", "position": "8", "datePublished": "2013" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1371/journal.pgen.1002997", "position": "9", "datePublished": "2012" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1038/msb.2010.25", "position": "10", "datePublished": "2010" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1016/j.biosystems.2012.07.004", "position": "11", "datePublished": "2012" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1016/j.pbi.2005.11.013", "position": "12", "datePublished": "2006" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1105/tpc.110.076083", "position": "13", "datePublished": "2010" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1073/pnas.0808444105", "position": "14", "datePublished": "2008" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1016/0092-8674(89)90900-8", "position": "15", "datePublished": "1989" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1126/science.1066609", "position": "16", "datePublished": "2002" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1104/pp.104.040212", "position": "17", "datePublished": "2004" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1038/nbt1206-1565", "position": "18", "datePublished": "2006" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1073/pnas.77.3.1516", "position": "19", "datePublished": "1980" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1093/bioinformatics/btq046", "position": "20", "datePublished": "2010" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1105/tpc.111.084020", "position": "21", "datePublished": "2011" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.5061/dryad.b835k", "position": "22", "datePublished": "2014" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1016/j.cub.2008.02.070", "position": "23", "datePublished": "2008" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1111/j.1469-8137.2010.03236.x", "position": "24", "datePublished": "2010" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1007/s00138-011-0345-9", "position": "25", "datePublished": "2012" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1016/j.cell.2012.02.048", "position": "26", "datePublished": "2012" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.1038/ncb2764", "position": "27", "datePublished": "2013" }], "funder": [{ "@type": "Organization", "name": "SystemsX" }, { "@type": "Organization", "@id": "https://doi.org/10.13039/501100003043", "name": "EMBO" }, { "@type": "Organization", "@id": "https://doi.org/10.13039/501100001711", "name": "Swiss National Science Foundation" }, { "@type": "Organization", "@id": "https://doi.org/10.13039/501100006390", "name": "University of Lausanne" }], "provider": { "@type": "Organization", "name": "Crossref" } } ``` Convert Crossref XML to DataCite XML: ``` bolognese read https://doi.org/10.7554/elife.01567 --as datacite 10.7554/eLife.01567 Sankar, Martial Martial Sankar Nieminen, Kaisa Kaisa Nieminen Ragni, Laura Laura Ragni Xenarios, Ioannis Ioannis Xenarios Hardtke, Christian S Christian S Hardtke Automated quantitative histology reveals vascular morphodynamics during Arabidopsis hypocotyl secondary growth eLife 2014 JournalArticle SystemsX EMBO https://doi.org/10.13039/501100003043 Swiss National Science Foundation https://doi.org/10.13039/501100001711 University of Lausanne https://doi.org/10.13039/501100006390 2014-02-11 2015-08-11T05:35:02Z https://doi.org/10.1038/nature02100 https://doi.org/10.1534/genetics.109.104976 https://doi.org/10.1034/j.1399-3054.2002.1140413.x https://doi.org/10.1162/089976601750399335 https://doi.org/10.1016/j.semcdb.2009.09.009 https://doi.org/10.1242/dev.091314 https://doi.org/10.1371/journal.pgen.1002997 https://doi.org/10.1038/msb.2010.25 https://doi.org/10.1016/j.biosystems.2012.07.004 https://doi.org/10.1016/j.pbi.2005.11.013 https://doi.org/10.1105/tpc.110.076083 https://doi.org/10.1073/pnas.0808444105 https://doi.org/10.1016/0092-8674(89)90900-8 https://doi.org/10.1126/science.1066609 https://doi.org/10.1104/pp.104.040212 https://doi.org/10.1038/nbt1206-1565 https://doi.org/10.1073/pnas.77.3.1516 https://doi.org/10.1093/bioinformatics/btq046 https://doi.org/10.1105/tpc.111.084020 https://doi.org/10.5061/dryad.b835k https://doi.org/10.1016/j.cub.2008.02.070 https://doi.org/10.1111/j.1469-8137.2010.03236.x https://doi.org/10.1007/s00138-011-0345-9 https://doi.org/10.1016/j.cell.2012.02.048 https://doi.org/10.1038/ncb2764 Creative Commons Attribution 3.0 (CC-BY 3.0) ``` Read DataCite XML: ``` bolognese read 10.5061/DRYAD.8515 --as datacite 10.5061/DRYAD.8515 1 Ollomo, Benjamin Durand, Patrick Prugnolle, Franck Douzery, Emmanuel J. P. Arnathau, Céline Nkoghe, Dieudonné Leroy, Eric Renaud, François Data from: A new malaria agent in African hominids. Dryad Digital Repository 2011 Phylogeny Malaria Parasites Taxonomy Mitochondrial genome Africa Plasmodium DataPackage Ollomo B, Durand P, Prugnolle F, Douzery EJP, Arnathau C, Nkoghe D, Leroy E, Renaud F (2009) A new malaria agent in African hominids. PLoS Pathogens 5(5): e1000446. 10.5061/DRYAD.8515/1 10.5061/DRYAD.8515/2 10.1371/JOURNAL.PPAT.1000446 19478877 ``` Convert DataCite XML to schema.org/JSON-LD: ```sh bolognese read 10.5061/DRYAD.8515 { "@context": "http://schema.org", "@type": "Dataset", "@id": "https://doi.org/10.5061/dryad.8515", "additionalType": "DataPackage", "name": "Data from: A new malaria agent in African hominids.", "alternateName": "Ollomo B, Durand P, Prugnolle F, Douzery EJP, Arnathau C, Nkoghe D, Leroy E, Renaud F (2009) A new malaria agent in African hominids. PLoS Pathogens 5(5): e1000446.", "author": [{ "@type": "Person", "givenName": "Benjamin", "familyName": "Ollomo" }, { "@type": "Person", "givenName": "Patrick", "familyName": "Durand" }, { "@type": "Person", "givenName": "Franck", "familyName": "Prugnolle" }, { "@type": "Person", "givenName": "Emmanuel J. P.", "familyName": "Douzery" }, { "@type": "Person", "givenName": "Céline", "familyName": "Arnathau" }, { "@type": "Person", "givenName": "Dieudonné", "familyName": "Nkoghe" }, { "@type": "Person", "givenName": "Eric", "familyName": "Leroy" }, { "@type": "Person", "givenName": "François", "familyName": "Renaud" }], "license": "http://creativecommons.org/publicdomain/zero/1.0/", "version": "1", "keywords": "Phylogeny, Malaria, Parasites, Taxonomy, Mitochondrial genome, Africa, Plasmodium", "datePublished": "2011", "hasPart": [{ "@type": "CreativeWork", "@id": "https://doi.org/10.5061/dryad.8515/1" }, { "@type": "CreativeWork", "@id": "https://doi.org/10.5061/dryad.8515/2" }], "citation": [{ "@type": "CreativeWork", "@id": "https://doi.org/10.1371/journal.ppat.1000446" }], "schemaVersion": "http://datacite.org/schema/kernel-3", "publisher": { "@type": "Organization", "name": "Dryad Digital Repository" }, "provider": { "@type": "Organization", "name": "DataCite" } } ``` Convert DataCite XML to schema version 4.0: ``` bolognese read 10.5061/DRYAD.8515 --as datacite --schema_version http://datacite.org/schema/kernel-4 10.5061/DRYAD.8515 Ollomo, Benjamin Benjamin Ollomo Durand, Patrick Patrick Durand Prugnolle, Franck Franck Prugnolle Douzery, Emmanuel J. P. Emmanuel J. P. Douzery Arnathau, Céline Céline Arnathau Nkoghe, Dieudonné Dieudonné Nkoghe Leroy, Eric Eric Leroy Renaud, François François Renaud Data from: A new malaria agent in African hominids. Dryad Digital Repository 2011 DataPackage Ollomo B, Durand P, Prugnolle F, Douzery EJP, Arnathau C, Nkoghe D, Leroy E, Renaud F (2009) A new malaria agent in African hominids. PLoS Pathogens 5(5): e1000446. Phylogeny Malaria Parasites Taxonomy Mitochondrial genome Africa Plasmodium 2011 https://doi.org/10.5061/dryad.8515/1 https://doi.org/10.5061/dryad.8515/2 https://doi.org/10.1371/journal.ppat.1000446 1 Public Domain (CC0 1.0) ``` ## Development We use rspec for unit testing: ``` bundle exec rspec ``` Follow along via [Github Issues](https://github.com/datacite/bolognese/issues). Please open an issue if conversion fails or metadata are not properly supported. ### Note on Patches/Pull Requests * Fork the project * Write tests for your new feature or a test that reproduces a bug * Implement your feature or make a bug fix * Do not mess with Rakefile, version or history * Commit, push and make a pull request. Bonus points for topical branches. ## License **bolognese** is released under the [MIT License](https://github.com/datacite/bolognese/blob/master/LICENSE.md).