= EPUB Parser

= {doctitle}

EPUB Parser gem parses EPUB 3 book loosely.

image:https://gitlab.com/KitaitiMakoto/epub-parser/badges/master/pipeline.svg[link="https://gitlab.com/KitaitiMakoto/epub-parser/commits/master",title="pipeline status"]
image:https://badge.fury.io/rb/epub-parser.svg[link="https://gemnasium.com/KitaitiMakoto/epub-parser",title="Gem Version"]
image:https://gitlab.com/KitaitiMakoto/epub-parser/badges/master/coverage.svg[link="https://kitaitimakoto.gitlab.io/epub-parser/coverage/",title="coverage report"]

* https://kitaitimakoto.gitlab.io/epub-parser/file.Home.html[Homepage]
* https://kitaitimakoto.gitlab.io/epub-parser/[Documentation]
* https://gitlab.com/KitaitiMakoto/epub-parser[Source Code]
* https://kitaitimakoto.gitlab.io/epub-parser/coverage/[Test Coverage]

== Installation

    gem install epub-parser

== Usage

=== As command-line tools

==== epubinfo

`epubinfo` tool extracts and shows the metadata of specified EPUB book.

See {file:docs/Epubinfo.markdown}.

==== epub-open

`epub-open` tool provides interactive shell(IRB) which helps you research about EPUB book.

See {file:docs/EpubOpen.markdown}.

==== epub-cover

`epub-cover` tool extract cover image from EPUB book.

See {file:docs/EpubCover.adoc}.

=== As a library

Use `EPUB::Parser.parse` at first:

----
require 'epub/parser'
    
book = EPUB::Parser.parse('/path/to/book.epub')
----

This book object can yield page by spine's order(spine defines the order to read that the author determines):

----
book.each_page_on_spine do |page|
  # do something...
end
----

`page` above is an {EPUB::Publication::Package::Manifest::Item} object and you can call {EPUB::Publication::Package::Manifest::Item#href #href} to see where is the page file:

----
book.each_page_on_spine do |page|
  file = page.href # => path/to/page/in/zip/archive
  html = Zip::Archive.open('/path/to/book.epub') {|zip|
    zip.fopen(file.to_s) {|file| file.read}
  }
end
----

And {EPUB::Publication::Package::Manifest::Item Item} provides syntax suger {EPUB::Publication::Package::Manifest::Item#read #read} for above:

----
html = page.read
doc = Nokogiri.HTML(html)
# do something with Nokogiri as always
----

For several utilities of Item, see {file:docs/Item.markdown} page.

By the way, although `book` above is a {EPUB::Book} object, all features are provided by {EPUB::Book::Features} module. Therefore YourBook class can include the features of {EPUB::Book::Features}:

----
require 'epub'

class YourBook < ActiveRecord::Base
    include EPUB::Book::Features
end

book = EPUB::Parser.parse(
  'uploaded-book.epub',
  class: YourBook # *************** pass YourBook class
)
book.instance_of? YourBook # => true
book.required = 'value for required field'
book.save!
book.each_page_on_spine do |epage|
  page = YouBookPage.create(
    :some_attr    => 'some attr',
    :content      => epage.read,
    :another_attr => 'another attr'
  )
  book.pages << page
end
----

You are also able to find YourBook object for the first:

----
book = YourBook.find params[:id]
ret = EPUB::Parser.parse(
  'uploaded-book.epub',
  book: book # ******************* pass your book instance
) # => book
ret == book # => true; this API is not good I feel... Welcome suggestion!
# do something with your book
----

==== Switching XML Library

EPUB Parser tries to load https://gitlab.com/yorickpeterse/oga[Oga] a fast XML parser at first. If Oga is not available, then it tries https://www.nokogiri.org/[Nokogiri], a Ruby bindings for http://xmlsoft.org/[Libxml2] and http://xmlsoft.org/XSLT/[Libxslt] and more. If both are not available, it fallbacks to https://ruby-doc.org/stdlib-2.5.3/libdoc/rexml/rdoc/index.html[REXML], a standard-bundled library. You can also specify REXML explicitly:

----
EPUB::Parser::XMLDocument.backend = :REXML
----

==== Switching ZIP library

EPUB Parser uses https://github.com/javanthropus/archive-zip[Archive::Zip], a pure Ruby ZIP library, by default. You can use https://bitbucket.org/winebarrel/zip-ruby/wiki/Home[Zip/Ruby], a Ruby bindings for https://libzip.org/[libzip] if you have already installed Zip/Ruby gem by RubyGems or Bundler.

Globally:

----
EPUB::OCF::PhysicalContainer.adapter = :Zipruby
book = EPUB::Parser.parse("path/to/book.epub")
----

For each EPUB book:

----
book = EPUB::Parser.parse("path/to/book.epub", container_adapter: :Zipruby)
----

== Documentation

=== APIs

More documentations are avaiable in:

* {file:docs/Publication.markdown} includes document's meta data, file list and so on.
* {file:docs/Item.markdown} represents a file in EPUB package.
* {file:docs/FixedLayout.markdown} provides APIs to declare how EPUB reader renders in such as reflowable or fixed layout.
* {file:docs/Navigation.markdown} describes how to use Navigation Document.
* {file:docs/Searcher.markdown} introduces APIs to search words and elements, and search by EPUB CFIs(a position pointer for EPUB) from EPUB documents.
* {file:docs/UnpackedArchive.markdown} describes how to handle directories which was generated by unzip EPUB files instead of EPUB files themselves.
* {file:docs/MultipleRenditions.markdown} describes about EPUB Multiple-Rendistions Publication and APIs for that.

=== Examples

Example usages are listed in {file:Examples} page.

* {file:docs/AggregateContentsFromWeb.markdown Aggregate Contents From the Web}
* {file:examples/exctract-content-using-cfi.rb Extract contents from EPUB files using EPUB CFI(identifier for EPUB)}
* {file:examples/find-elements-and-cfis.rb Find elements and CFIs}

=== Building documentation

If you installed EPUB Parser via gem command, you can also generate documentaiton by your own(https://gitlab.com/KitaitiMakoto/rubygems-yardoc[rubygems-yardoc] gem is needed):

----
$ gem install epub-parser
$ gem yardoc epub-parser
...
Files:          33
Modules:        20 (   20 undocumented)
Classes:        45 (   44 undocumented)
Constants:      31 (   31 undocumented)
Methods:       292 (   88 undocumented)
52.84% documented
YARD documentation is generated to:
/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc
----

It will show you path to generated documentation(`/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc` here) at the end.

Or, generating yardoc command is possible, too:

----
$ git clone https://gitlab.com/KitaitiMakoto/epub-parser.git
$ cd epub-parser
$ bundle install --path=deps
$ bundle exec rake doc:yard
...
Files:          33
Modules:        20 (   20 undocumented)
Classes:        45 (   44 undocumented)
Constants:      31 (   31 undocumented)
Methods:       292 (   88 undocumented)
52.84% documented
----

Then documentation will be available in `doc` directory.

== Requirements

* Ruby 2.2.0 or later

== History

See {file:CHANGELOG.adoc}.

== Note

This library is still in work.
Only a few features are implemented and APIs might be changed in the future.
Note that.

Currently implemented:

* container.xml of http://idpf.org/epub/30/spec/epub30-ocf.html#sec-container-metainf-container.xml[EPUB Open Container Format (OCF) 3.0]
* http://idpf.org/epub/30/spec/epub30-publications.html[EPUB Publications 3.0]
* EPUB Navigation Documents of http://www.idpf.org/epub/30/spec/epub30-contentdocs.html[EPUB Content Documents 3.0]
* http://www.idpf.org/epub/fxl/[EPUB 3 Fixed-Layout Documents]
* metadata.xml of http://www.idpf.org/epub/renditions/multiple/[EPUB Multiple-Rendition Publications]

== License

This library is distributed under the term of the MIT Licence.
See {file:MIT-LICENSE} file for more info.