Sha256: 77dc24c2de850a0e9682c16bdaa5eb1567d9be5a664ce808235c685dd466314e

Contents?: true

Size: 1.96 KB

Versions: 1

Compression:

Stored size: 1.96 KB

Contents

= EngTagger

English Part-of-Speech Tagger Library; a Ruby port of Lingua::EN::Tagger

=== Description

A Ruby port of Perl Lingua::EN::Tagger, a probability based, corpus-trained 
tagger that assigns POS tags to English text based on a lookup dictionary and 
a set of probability values. The tagger assigns appropriate tags based on 
conditional probabilities--it examines the preceding tag to determine the 
appropriate tag for the current word. Unknown words are classified according to 
word morphology or can be set to be treated as nouns or other parts of speech.  
The tagger also extracts as many nouns and noun phrases as it can, using a set 
of regular expressions.

=== Features

* Assigns POS tags to English text
* Extract noun phrases from tagged text
* etc.

=== Synopsis:

  # Create a parser object
  tgr = Tagger.new
  
  # Add part-of-speech tags to text
  tagged = tgr.add_tags(text)
  
  # Get a list of all nouns and noun phrases with occurrence counts
  word_list = tgr.get_words(text)
  
  # Get a readable version of the tagged text
  readable_text = tgr.get_readable(text)
  
  # Get all nouns from a tagged output
  ns = tgr.get_nouns(tagged)

  # Get all proper nouns
  pns = tgr.get_proper_nouns(tagged)
  
  # Get all noun phrases of any syntactic level
  nps = tgr.get_noun_phrases(tagged)

=== Requirements

* Ruby 1.8.6
* Hpricot[http://code.whytheluckystiff.net/hpricot/] (optional)

=== Install

  (sudo) gem install engtagger

=== Authors

of this Ruby library 
* Yoichiro Hasebe (yohasebe [at] gmail.com) 

of the original Perl module
* Aaron Coburn (acoburn [at] middlebury.edu)

=== Acknowledgement

This Ruby library is a direct port of Lingua::EN::Tagger available at CPAN.
The credit for the crucial part of its algorithm/design therefore goes to 
Aaron Coburn, the author of the original Perl version.

=== License

This library is distributed under the GPL.  Please see the LICENSE file.

Version data entries

1 entries across 1 versions & 1 rubygems

Version Path
engtagger-0.1.0 README.txt