= EngTagger English Part-of-Speech Tagger Library; a Ruby port of Lingua::EN::Tagger === Description A Ruby port of Perl Lingua::EN::Tagger, a probability based, corpus-trained tagger that assigns POS tags to English text based on a lookup dictionary and a set of probability values. The tagger assigns appropriate tags based on conditional probabilities--it examines the preceding tag to determine the appropriate tag for the current word. Unknown words are classified according to word morphology or can be set to be treated as nouns or other parts of speech. The tagger also extracts as many nouns and noun phrases as it can, using a set of regular expressions. === Features * Assigns POS tags to English text * Extract noun phrases from tagged text * etc. === Synopsis: # Create a parser object tgr = Tagger.new # Add part-of-speech tags to text tagged = tgr.add_tags(text) # Get a list of all nouns and noun phrases with occurrence counts word_list = tgr.get_words(text) # Get a readable version of the tagged text readable_text = tgr.get_readable(text) # Get all nouns from a tagged output ns = tgr.get_nouns(tagged) # Get all proper nouns pns = tgr.get_proper_nouns(tagged) # Get all noun phrases of any syntactic level nps = tgr.get_noun_phrases(tagged) === Requirements * Ruby 1.8.6 * Hpricot[http://code.whytheluckystiff.net/hpricot/] (optional) === Install (sudo) gem install engtagger === Authors of this Ruby library * Yoichiro Hasebe (yohasebe [at] gmail.com) of the original Perl module * Aaron Coburn (acoburn [at] middlebury.edu) === Acknowledgement This Ruby library is a direct port of Lingua::EN::Tagger available at CPAN. The credit for the crucial part of its algorithm/design therefore goes to Aaron Coburn, the author of the original Perl version. === License This library is distributed under the GPL. Please see the LICENSE file.