= Harvestdor::Indexer
{}[https://travis-ci.org/sul-dlss/harvestdor-indexer]
{}[https://coveralls.io/r/sul-dlss/harvestdor-indexer]
{}[https://gemnasium.com/sul-dlss/harvestdor-indexer]
{}[http://badge.fury.io/rb/harvestdor-indexer]
A Gem to harvest meta/data from DOR and the skeleton code to index it and write to Solr.
== Installation
Add this line to your application's Gemfile:
gem 'harvestdor-indexer'
And then execute:
$ bundle
Or install it yourself as:
$ gem install harvestdor-indexer
== Usage
You must override the index method and provide configuration options. It is recommended to write a script to run it, too - example below.
=== Configuration / Set up
Create a yml config file for your collection going to a Solr index.
See spec/config/ap.yml for an example.
You will want to copy that file and change the following settings:
* whitelist
* dor fetcher service_url
* solr url
* harvestdor log_dir, log_nam
==== Whitelist
The whitelist is how you specify which objects to index. The whitelist can be
* an Array of druids inline in the config yml file
* a filename containing a list of druids (one per line)
If a druid, per the object's identityMetadata at purl page, is for a
* collection record: then we process all the item druids in that collection (as if they were included individually in the whitelist)
* non-collection record: then we process the druid as an individual item
=== Override the Harvestdor::Indexer.index method
In your code, override this method from the Harvestdor::Indexer class
# create Solr doc for the druid and add it to Solr
# NOTE: don't forget to send commit to Solr, either once at end (already in harvest_and_index), or for each add, or ...
def index resource
benchmark "Indexing #{resource.druid}" do
logger.debug "About to index #{resource.druid}"
doc_hash = {}
doc_hash[:id] = resource.druid
# you might add things from Indexer level class here
# (e.g. things that are the same across all documents in the harvest)
solr.add doc_hash
# TODO: provide call to code to update DOR object's workflow datastream??
end
end
=== Run it
(bundle install)
You may want to write a script to run the code. Your script might look like this:
#!/usr/bin/env ruby
$LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..'))
$LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
require 'rubygems'
begin
require 'your_indexer'
rescue LoadError
require 'bundler/setup'
require 'your_indexer'
end
config_yml_path = ARGV.pop
if config_yml_path.nil?
puts "** You must provide the full path to a collection config yml file **"
exit
end
indexer = Harvestdor::Indexer.new(config_yml_path, opts)
indexer.harvest_and_index
Then you run the script like so:
$ ./bin/indexer config/(your coll).yml
Run from deployed instance, as that box is already set up to be able to talk to DOR Fetcher service and to SUL Solr indexes.
== Contributing
* Fork it (https://help.github.com/articles/fork-a-repo/)
* Create your feature branch (`git checkout -b my-new-feature`)
* Write code and tests.
* Commit your changes (`git commit -am 'Added some feature'`)
* Push to the branch (`git push origin my-new-feature`)
* Create new Pull Request (https://help.github.com/articles/creating-a-pull-request/)