= EPUB Parser = {doctitle} EPUB Parser gem parses EPUB 3 book loosely. image:https://gitlab.com/KitaitiMakoto/epub-parser/badges/master/pipeline.svg[link="https://gitlab.com/KitaitiMakoto/epub-parser/commits/master",title="pipeline status"] image:https://badge.fury.io/rb/epub-parser.svg[link="https://gemnasium.com/KitaitiMakoto/epub-parser",title="Gem Version"] image:https://gitlab.com/KitaitiMakoto/epub-parser/badges/master/coverage.svg[link="https://kitaitimakoto.gitlab.io/epub-parser/coverage/",title="coverage report"] * https://kitaitimakoto.gitlab.io/epub-parser/file.Home.html[Homepage] * https://kitaitimakoto.gitlab.io/epub-parser/[Documentation] * https://gitlab.com/KitaitiMakoto/epub-parser[Source Code] * https://kitaitimakoto.gitlab.io/epub-parser/coverage/[Test Coverage] == Installation gem install epub-parser == Usage === As command-line tools ==== epubinfo `epubinfo` tool extracts and shows the metadata of specified EPUB book. See {file:docs/Epubinfo.markdown}. ==== epub-open `epub-open` tool provides interactive shell(IRB) which helps you research about EPUB book. See {file:docs/EpubOpen.markdown}. ==== epub-cover `epub-cover` tool extract cover image from EPUB book. See {file:docs/EpubCover.adoc}. === As a library Use `EPUB::Parser.parse` at first: ---- require 'epub/parser' book = EPUB::Parser.parse('/path/to/book.epub') ---- This book object can yield page by spine's order(spine defines the order to read that the author determines): ---- book.each_page_on_spine do |page| # do something... end ---- `page` above is an {EPUB::Publication::Package::Manifest::Item} object and you can call {EPUB::Publication::Package::Manifest::Item#href #href} to see where is the page file: ---- book.each_page_on_spine do |page| file = page.href # => path/to/page/in/zip/archive html = Zip::Archive.open('/path/to/book.epub') {|zip| zip.fopen(file.to_s) {|file| file.read} } end ---- And {EPUB::Publication::Package::Manifest::Item Item} provides syntax suger {EPUB::Publication::Package::Manifest::Item#read #read} for above: ---- html = page.read doc = Nokogiri.HTML(html) # do something with Nokogiri as always ---- For several utilities of Item, see {file:docs/Item.markdown} page. By the way, although `book` above is a {EPUB::Book} object, all features are provided by {EPUB::Book::Features} module. Therefore YourBook class can include the features of {EPUB::Book::Features}: ---- require 'epub' class YourBook < ActiveRecord::Base include EPUB::Book::Features end book = EPUB::Parser.parse( 'uploaded-book.epub', class: YourBook # *************** pass YourBook class ) book.instance_of? YourBook # => true book.required = 'value for required field' book.save! book.each_page_on_spine do |epage| page = YouBookPage.create( :some_attr => 'some attr', :content => epage.read, :another_attr => 'another attr' ) book.pages << page end ---- You are also able to find YourBook object for the first: ---- book = YourBook.find params[:id] ret = EPUB::Parser.parse( 'uploaded-book.epub', book: book # ******************* pass your book instance ) # => book ret == book # => true; this API is not good I feel... Welcome suggestion! # do something with your book ---- ==== Switching XML Library EPUB Parser tries to load https://gitlab.com/yorickpeterse/oga[Oga] a fast XML parser at first. If Oga is not available, then it tries https://www.nokogiri.org/[Nokogiri], a Ruby bindings for http://xmlsoft.org/[Libxml2] and http://xmlsoft.org/XSLT/[Libxslt] and more. If both are not available, it fallbacks to https://ruby-doc.org/stdlib-2.5.3/libdoc/rexml/rdoc/index.html[REXML], a standard-bundled library. You can also specify REXML explicitly: ---- EPUB::Parser::XMLDocument.backend = :REXML ---- ==== Switching ZIP library EPUB Parser uses https://github.com/javanthropus/archive-zip[Archive::Zip], a pure Ruby ZIP library, by default. You can use https://bitbucket.org/winebarrel/zip-ruby/wiki/Home[Zip/Ruby], a Ruby bindings for https://libzip.org/[libzip] if you have already installed Zip/Ruby gem by RubyGems or Bundler. Globally: ---- EPUB::OCF::PhysicalContainer.adapter = :Zipruby book = EPUB::Parser.parse("path/to/book.epub") ---- For each EPUB book: ---- book = EPUB::Parser.parse("path/to/book.epub", container_adapter: :Zipruby) ---- == Documentation === APIs More documentations are avaiable in: * {file:docs/Publication.markdown} includes document's meta data, file list and so on. * {file:docs/Item.markdown} represents a file in EPUB package. * {file:docs/FixedLayout.markdown} provides APIs to declare how EPUB reader renders in such as reflowable or fixed layout. * {file:docs/Navigation.markdown} describes how to use Navigation Document. * {file:docs/Searcher.markdown} introduces APIs to search words and elements, and search by EPUB CFIs(a position pointer for EPUB) from EPUB documents. * {file:docs/UnpackedArchive.markdown} describes how to handle directories which was generated by unzip EPUB files instead of EPUB files themselves. * {file:docs/MultipleRenditions.markdown} describes about EPUB Multiple-Rendistions Publication and APIs for that. === Examples Example usages are listed in {file:Examples} page. * {file:docs/AggregateContentsFromWeb.markdown Aggregate Contents From the Web} * {file:examples/exctract-content-using-cfi.rb Extract contents from EPUB files using EPUB CFI(identifier for EPUB)} * {file:examples/find-elements-and-cfis.rb Find elements and CFIs} === Building documentation If you installed EPUB Parser via gem command, you can also generate documentaiton by your own(https://gitlab.com/KitaitiMakoto/rubygems-yardoc[rubygems-yardoc] gem is needed): ---- $ gem install epub-parser $ gem yardoc epub-parser ... Files: 33 Modules: 20 ( 20 undocumented) Classes: 45 ( 44 undocumented) Constants: 31 ( 31 undocumented) Methods: 292 ( 88 undocumented) 52.84% documented YARD documentation is generated to: /path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc ---- It will show you path to generated documentation(`/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc` here) at the end. Or, generating yardoc command is possible, too: ---- $ git clone https://gitlab.com/KitaitiMakoto/epub-parser.git $ cd epub-parser $ bundle install --path=deps $ bundle exec rake doc:yard ... Files: 33 Modules: 20 ( 20 undocumented) Classes: 45 ( 44 undocumented) Constants: 31 ( 31 undocumented) Methods: 292 ( 88 undocumented) 52.84% documented ---- Then documentation will be available in `doc` directory. == Requirements * Ruby 2.2.0 or later == History See {file:CHANGELOG.adoc}. == Note This library is still in work. Only a few features are implemented and APIs might be changed in the future. Note that. Currently implemented: * container.xml of http://idpf.org/epub/30/spec/epub30-ocf.html#sec-container-metainf-container.xml[EPUB Open Container Format (OCF) 3.0] * http://idpf.org/epub/30/spec/epub30-publications.html[EPUB Publications 3.0] * EPUB Navigation Documents of http://www.idpf.org/epub/30/spec/epub30-contentdocs.html[EPUB Content Documents 3.0] * http://www.idpf.org/epub/fxl/[EPUB 3 Fixed-Layout Documents] * metadata.xml of http://www.idpf.org/epub/renditions/multiple/[EPUB Multiple-Rendition Publications] == License This library is distributed under the term of the MIT Licence. See {file:MIT-LICENSE} file for more info.