README.md in open-nlp-0.1.0 vs README.md in open-nlp-0.1.1
- old
+ new
@@ -1,109 +1,169 @@
[![Build Status](https://secure.travis-ci.org/louismullie/open-nlp.png)](http://travis-ci.org/louismullie/open-nlp)
-**About**
+###About
-This library provides high-level Ruby bindings to the Open NLP package, a Java machine learning toolkit for natural language processing (NLP).
+This library provides high-level Ruby bindings to the Open NLP package, a Java machine learning toolkit for natural language processing (NLP).
-This gem only provides a thin wrapper over the OpenNLP API. If you are looking for a Ruby natural language processing framework, have a look at [Treat](https://github.com/louismullie/treat).
+###Installing
-**Installing**
+__Note: If you are running on MRI, this gem will use the Ruby-Java Bridge (Rjb), which currently does not support Java 7. Therefore, if you have installed Java 7, you should set your JAVA_HOME to point to your old Java 6 install before installing Rjb; for example, `export "JAVA_HOME=/usr/lib/jvm/java-6-openjdk/"`.__
-_Note: If you are running on MRI, this gem will use the Ruby-Java Bridge (Rjb), which currently does not support Java 7. Therefore, if you have installed Java 7, you should set your JAVA_HOME to point to your old Java 6 install before installing Rjb; for example, `export "JAVA_HOME=/usr/lib/jvm/java-6-openjdk/"`.
+First, install the gem: `gem install open-nlp`. Then, individually download the appropriate models from the [open-nlp website](http://opennlp.sourceforge.net/models-1.5/) or just get [all English language models](louismullie.com/treat/open-nlp-english.zip) in one package (80 MB).
-First, install the gem: `gem install open-nlp`. Then, individually download the appropriate models from the [open-nlp website](http://opennlp.sourceforge.net/models-1.5/) or just get [all english language models](louismullie.com/treat/open-nlp-english.zip) in one package (80 MB).
-
Place the contents of the extracted archive inside the /bin/ folder of the open-nlp gem (e.g. [...]/gems/open-nlp-0.x.x/bin/).
-**Configuration**
+Alternatively, from a terminal window, `cd` to the gem's folder and run:
-After installing and requiring the gem (`require 'open-nlp'`), you may want to set some optional configuration options. Here are some examples:
+```
+wget http://www.louismullie.com/treat/open-nlp-english.zip
+unzip -o open-nlp-english.zip -d bin/
+```
+###Configuring
+
+After installing and requiring the gem (`require 'open-nlp'`), you may want to set some of the following configuration options.
+
```ruby
-# Set an alternative path to look for the JAR files
+# Set an alternative path to look for the JAR files.
# Default is gem's bin folder.
OpenNLP.jar_path = '/path_to_jars/'
-# Set an alternative path to look for the model files
+# Set an alternative path to look for the model files.
# Default is gem's bin folder.
OpenNLP.model_path = '/path_to_models/'
# Pass some alternative arguments to the Java VM.
# Default is ['-Xms512M', '-Xmx1024M'].
OpenNLP.jvm_args = ['-option1', '-option2']
# Redirect VM output to log.txt
OpenNLP.log_file = 'log.txt'
-# WARNING: Not implemented yet.
+```
-# Use the model files for a different language than English.
-# OpenNLP.use(:french) # or :german
-#
-# Change a specific model file.
-# OpenNLP.set_model('pos.model', 'english-left3words-distsim.tagger')
+###Examples
+
+
+**Simple tokenizer**
+
+```ruby
+OpenNLP.load
+
+sent = "The death of the poet was kept from his poems."
+tokenizer = OpenNLP::SimpleTokenizer.new
+
+tokens = tokenizer.tokenize(sent).to_a
+# => %w[The death of the poet was kept from his poems .]
```
-**Using the gem**
+**Maximum entropy tokenizer, chunker and POS tagger**
```ruby
-text = 'Angela Merkel met Nicolas Sarkozy on January 25th in ' +
- 'Berlin to discuss a new $25 billion austerity package.' +
- 'Sarkozy looked pleased, but Merkel was dismayed.'
+OpenNLP.load
+
+chunker = OpenNLP::ChunkerME.new
+tokenizer = OpenNLP::TokenizerME.new
+tagger = OpenNLP::POSTaggerME.new
+
+sent = "The death of the poet was kept from his poems."
+
+tokens = tokenizer.tokenize(sent).to_a
+# => %w[The death of the poet was kept from his poems .]
+
+tags = tagger.tag(tokens).to_a
+# => %w[DT NN IN DT NN VBD VBN IN PRP$ NNS .]
+
+chunks = chunker.chunk(tokens, tags).to_a
+# => %w[B-NP I-NP B-PP B-NP I-NP B-VP I-VP B-PP B-NP I-NP O]
+```
+
+**Abstract Bottom-Up Parser**
+
+```ruby
+OpenNLP.load
+
+sent = "The death of the poet was kept from his poems."
+parser = OpenNLP::Parser.new
+parse = parser.parse(sent)
+
+parse.get_text.should eql sent
+
+parse.get_span.get_start.should eql 0
+parse.get_span.get_end.should eql 46
+parse.get_child_count.should eql 1
+
+child = parse.get_children[0]
+
+child.text # => "The death of the poet was kept from his poems."
+child.get_child_count # => 3
+child.get_head_index #=> 5
+child.get_type # => "S"
+```
+
+**Maximum Entropy Name Finder***
+
+```ruby
+OpenNLP.load
+
+text = File.read('./spec/sample.txt').gsub!("\n", "")
+
tokenizer = OpenNLP::TokenizerME.new
segmenter = OpenNLP::SentenceDetectorME.new
-tagger = OpenNLP::POSTaggerME.new
ner_models = ['person', 'time', 'money']
ner_finders = ner_models.map do |model|
- OpenNLP::NameFinderME.new("en-ner-#{model}.bin")
+ OpenNLP::NameFinderME.new("en-ner-#{model}.bin")
end
sentences = segmenter.sent_detect(text)
-all_entities = []
+named_entities = []
sentences.each do |sentence|
- tokens = tokenizer.tokenize(sentence)
- tags = tagger.tag(tokens)
-
- # Get a list of all tokens.
- puts tokens.to_a.inspect
- # Get the sentence's text.
- puts sentence.to_s.inspect
- # Get the sentence's tags.
- puts tags.to_a.inspect
+ tokens = tokenizer.tokenize(sentence)
+
+ ner_models.each_with_index do |model,i|
+ finder = ner_finders[i]
+ name_spans = finder.find(tokens)
+ name_spans.each do |name_span|
+ start = name_span.get_start
+ stop = name_span.get_end-1
+ slice = tokens[start..stop].to_a
+ named_entities << [slice, model]
+ end
+ end
- # Run three NER models and find entities.
- ner_models.each_with_index do |model,i|
- finder = ner_finders[i]
- name_spans = finder.find(tokens)
- name_spans.each do |name_span|
- start = name_span.get_start
- stop = name_span.get_end-1
- slice = tokens[start..stop].to_a
- all_entities << [slice, model]
- end
- end
-
end
+```
-# Show all named entities.
-puts all_entities.inspect
+**Loading specific models**
+
+Just pass the name of the model file to the constructor. The gem will search for the file in the `OpenNLP.model_path` folder.
+
+```ruby
+OpenNLP.load
+
+tokenizer = OpenNLP::TokenizerME.new('en-token.bin')
+tagger = OpenNLP::POSTaggerME.new('en-pos-perceptron.bin')
+name_finder = OpenNLP::NameFinderME.new('en-ner-person.bin')
+# etc.
```
**Loading specific classes**
-You may also want to load your own classes from the Stanford NLP to do more specific tasks. The gem provides an API to do this:
+You may want to load specific classes from the OpenNLP library that are not loaded by default. The gem provides an API to do this:
```ruby
# Default base class is opennlp.tools.
OpenNLP.load_class('SomeClassName')
+# => OpenNLP::SomeClassName
# Here, we specify another base class.
-OpenNLP.load_class('SomeOtherClass', 'opennlp.tools.namefind')
+OpenNLP.load_class('SomeOtherClass', 'opennlp.tools.namefind')
+# => OpenNLP::SomeOtherClass
```
**Contributing**
-Feel free to fork the project and send me a pull request!
+Fork the project and send me a pull request! Config updates for other languages are welcome.
\ No newline at end of file