README.md in traject-3.0.0 vs README.md in traject-3.1.0.rc1
- old
+ new
@@ -17,11 +17,11 @@
Initially by Jonathan Rochkind (Johns Hopkins Libraries) and Bill Dueber (University of Michigan Libraries).
* Basic configuration files can be easily written even by non-rubyists, with a few simple directives traject provides. But config files are 'ruby all the way down', so we can provide a gradual slope to more complex needs, with the full power of ruby.
* Easy to program, easy to read, easy to modify.
* Fast. Traject by default indexes using multiple threads, on multiple cpu cores, when the underlying ruby implementation (i.e., JRuby) allows it, and can use a separate thread for communication with solr even under MRI. Traject is intended to be usable to process millions of records.
-* Composed of decoupled components, for flexibility and extensibility.
+* Composed of decoupled components, for flexibility and extensibility.f?
* Designed to support local code and configuration that's maintainable and testable, and can be shared between projects as ruby gems.
* Easy to split configuration between multiple files, for simple "pick-and-choose" command line options that can combine to deal with any of your local needs.
## Installation
@@ -133,11 +133,11 @@
For the syntax and complete possibilities of the specification string argument to extract_marc, see docs at the [MarcExtractor class](./lib/traject/marc_extractor.rb) ([rdoc](http://rdoc.info/gems/traject/Traject/MarcExtractor)).
To see all options for `extract_marc`, see the [extract_marc](http://rdoc.info/gems/traject/Traject/Macros/Marc21:extract_marc) method documentation.
-### XML mode, extract_xml
+### XML mode, extract_xpath
See our [xml guide](./doc/xml.md) for more XML examples, but you will usually use extract_xpath.
to_field "title", extract_xpath("//title")
@@ -309,16 +309,19 @@
In addition to `to_field`, an `each_record` method is available, which,
like `to_field`, is executed for every record, but without being tied
to a specific output field.
`each_record` can be used for logging or notifiying, computing intermediate
-results, or writing to more than one field at once.
+results, or more complex ruby logic.
~~~ruby
each_record do |record|
some_custom_logging(record)
end
+ each_record do |record, context|
+ context.add_output(:some_value, extract_some_value_from_record(record))
+ end
~~~
For more on `each_record`, see [Indexing Rules: Macros and Custom Logic](./doc/indexing_rules.md).
There is also an `after_processing` method that can be used to register logic that will be called after the entire input has been processed. You can use it for whatever custom ruby code you might want for your app (send an email? Clean up a log file? Trigger a Solr replication?)
@@ -403,10 +406,10 @@
array (`[]`). Writers that need to special-case empty fields should do so in the
writer class in question.
## The traject command Line
-(If you are interested in running traject in an embedded/programmatic context instead of as a standalone command-line batch process, please see docs on [Programmatic Use](./docs/programmatic_use.md) )
+(If you are interested in running traject in an embedded/programmatic context instead of as a standalone command-line batch process, please see docs on [Programmatic Use](./doc/programmatic_use.md) )
The simplest invocation is:
traject -c conf_file.rb marc_file.mrc