README.md in biblicit-1.0 vs README.md in biblicit-2.0.3

- old
+ new

@@ -1,37 +1,134 @@
 biblicit
 =============
 
 Extract citations from PDFs.
 
-## Usage
+Note: The version is 2.x, but really should be 0.2.x.
 
+
+# Usage
+
 ```ruby
-  # Extract metadata from a file using the code from CiteSeerX
-  Biblicit.extract(file: "myfile.pdf", tool: :citeseer)
+  # Extract metadata from a file using default tools and settings
+  result = Biblicit::Extractor.extract(content: "a string containing the content of a PDF file")
 
-  # Extract metadata from the contents of a PDF using cb2bib
-  Biblicit.extract(contents: IO.read("myfile.pdf"), tool: :cb2bib, remote: true)
+  # Extract metadata from a file using all available tools
+  result = Biblicit::Extractor.extract(file: "myfile.pdf", tools: [:citeseer, :parshed, :cb2bib], remote: true, token: false)
+
+  # See reference information for "myfile.pdf"
+  result[:citeseer][:title]
+  result[:parshed][:title]
+  result[:citeseer][:authors]
+  # etc
 ```
 
-## Algorithms
 
+# Algorithms
+
 ### CiteSeer (default)
 
 Wrapper around Perl code extracted from [CiteSeerX](http://citeseer.ist.psu.edu/). 
 
-Uses [Apache PDFBox](http://pdfbox.apache.org/) to extract text from the PDF, uses a model trained with the [svm-light](http://svmlight.joachims.org/) Support Vector Machine library to extract citation data for the PDF itself, and then uses [ParsCit](http://aye.comp.nus.edu.sg/parsCit/)'s model trained with the [CRF++](http://code.google.com/p/crfpp/) Conditional Random Fields library to parse citations from the PDF's bibliography, if any.
+Uses a model trained with the [svm-light](http://svmlight.joachims.org/) Support Vector Machine library.
 
+### ParsCit (default) 
+
+Wrapper around Perl & Ruby code from [ParsCit](http://aye.comp.nus.edu.sg/parsCit/), which is included as a Git submodule.
+
+Uses a model trained with the [CRF++](http://code.google.com/p/crfpp/) Conditional Random Fields library.
+
 ### cb2Bib
 
 Wrapper around [cb2Bib](http://www.molspaces.com/cb2bib/) in command-line mode.
 
-Uses pdf2text from [Xpdf](http://www.foolabs.com/xpdf/download.html) to extract text from the PDF, uses an apparently less-sophisticated parsing algorithm than the CiteSeerX code to parse metadata, but then, if :remote=true, scrapes one of a large number of journal or public repository websites for a structured version of the citation data.
+Uses an apparently less-sophisticated parsing algorithm than the others to parse metadata, but then, if :remote=true, scrapes one of a large number of journal or public repository websites for a structured version of the citation data. Warning: sometimes it finds the wrong work!
 
-## Requirements
 
-### CRF++
+# Requirements
+
+There are a lot, but you may not need all of them, depending on your use case.
+
+
+## Required to support various input file formats
+
+Different tools are used for different input file formats.
+
+#### PDF - [Poppler](http://poppler.freedesktop.org/)
+
+This provides `pdftotext`. You could install `xpdf` instead.
+
+##### From source
+
+Requires fontconfig.
+
+    wget http://poppler.freedesktop.org/poppler-0.22.1.tar.gz
+    tar -xzf poppler-0.22.1.tar.gz
+    cd poppler-0.22.1
+    ./configure
+    make
+    sudo make install
+
+##### On Debian/Ubuntu
+
+    sudo apt-get install poppler-utils
+
+##### On OS X with Homebrew
+
+    brew install poppler
+
+#### Postscript - [Ghostscript](http://www.ghostscript.com/)
+
+This provides `ps2ascii`.
+
+##### From source
+
+    wget http://downloads.ghostscript.com/public/ghostscript-9.06.tar.gz
+    tar -xzf ghostscript-9.06.tar.gz
+    cd ghostscript-9.06
+    make
+    sudo make install
+
+##### On Debian/Ubuntu
+
+    sudo apt-get install ghostscript
+
+##### On OS X with Homebrew
+
+    brew install ghostscript
+
+#### Other (e.g. docx) - [AbiWord](http://www.abisource.com/)
+
+This provides `abiword`.
+
+##### On Debian/Ubuntu
+
+    sudo apt-get install abiword
+
+##### On OS X
+
+As of writing, you're out of luck, because AbiWord doesn't compile on recent versions of OS X. According to their website, however, this is being actively worked on.
+
+
+## Required to use either the ParsCit or CiteSeer algorithms
+
+#### Perl modules
+
+More than these might be required; this is what I had to add to my default installation.
+
+##### From CPAN
+
+    sudo cpan install Digest::SHA1
+    sudo cpan install String::Approx
+    sudo cpan install XML::Writer::String
+    sudo cpan install XML::Twig
+
+## Required to use the ParsCit algorithm
+
+#### CRF++
+
+You can specify where you have installed CRF++ by setting the CRFPP_HOME environment variable.
  
 ##### From source
 
     wget http://crfpp.googlecode.com/files/CRF%2B%2B-0.57.tar.gz
     tar xvzf CRF++-0.57.tar.gz
@@ -42,44 +139,38 @@
 
 ##### On Debian/Ubuntu
 
     sudo apt-add-repository 'deb http://cl.naist.jp/~eric-n/ubuntu-nlp oneiric all'
     sudo apt-get update
-    sudo apt-get install libcrf++
+    sudo apt-get install libcrf++ crf++
 
 ##### On OS X with Homebrew
 
     brew install crf++
 
-### svm-light
+## Required to use the CiteSeer algorithm
 
-The included model requires version 5, not the current version.
+#### svm-light
 
+Required for header extraction (reference information for the input work itself).
+
+The included model requires version 5, not the current version. You can specify where you have installed svm-light by setting the SVM_LIGHT_HOME environment variable.
+
 ##### From source
 
     mkdir svm_light5
     cd svm_light5
     wget http://download.joachims.org/svm_light/v5.00/svm_light.tar.gz
     tar -xzf svm_light.tar.gz
     make
-    sudo ln -s $(readlink -f "$(dirname svm_classify)/$(basename svm_classify)") /usr/bin/svm_classify5
-    sudo ln -s $(readlink -f "$(dirname svm_learn)/$(basename svm_learn)") /usr/bin/svm_learn5
+    echo "export SVM_LIGHT_HOME=`pwd`" >> ~/.profile # or .bashrc or whatever
+    source ~/.profile
 
-Note: On OS X you'll need to use greadlink instead of readlink if you have coreutils installed, or another workaround for the absence of `-f`.
+## Required to use the cb2bib algorithm
 
-### Perl modules
+#### cb2Bib
 
-##### From CPAN
-
-    sudo cpan install DBI
-    sudo cpan install Digest::SHA1
-    sudo cpan install Log::Log4perl
-    sudo cpan install Log::Dispatch
-    sudo cpan install String::Approx
-
-### cb2bib
-
 ##### From source (Linux)
 
     wget http://www.molspaces.com/dl/progs/cb2bib-1.4.9.tar.gz
     tar -xzvf cb2bib-1.4.9.tar.gz
     cd cb2bib-1.4.9
@@ -103,18 +194,22 @@
 
 ##### On Debian/Ubuntu
 
     sudo apt-get install cb2bib
 
-### Other
 
+## Other
+
+(I'm not currently sure what this was required for; TODO figure it out!)
+
 ##### On Debian/Ubuntu
 
     sudo apt-get install libicu-dev
 
-## Copying
 
-Copyright Academia.edu or the original author(s).
+# Copying
+
+Copyright Academia.edu or the original author(s) - see documentation in the included parscit and svm-header-parse directories.
 
 Apache licensed (see LICENSE.TXT).
 
 Please note svm-light is in general free only for non-commercial use, but can be used in this gem by permission of the author. For conditions on additional uses see [the website](http://svmlight.joachims.org/).