Class String
In: lib/classifier/extensions/word_hash.rb
lib/classifier/lsi/summary.rb
Parent: Object
Author:Lucas Carlson (lucas@rufy.com)
Copyright:Copyright © 2005 Lucas Carlson
License:LGPL

Methods

Constants

CORPUS_SKIP_WORDS = [ "a", "again", "all", "along", "are", "also", "an", "and", "as", "at", "but", "by", "came", "can", "cant", "couldnt", "did", "didn", "didnt", "do", "doesnt", "dont", "ever", "first", "from", "have", "her", "here", "him", "how", "i", "if", "in", "into", "is", "isnt", "it", "itll", "just", "last", "least", "like", "most", "my", "new", "no", "not", "now", "of", "on", "or", "should", "sinc", "so", "some", "th", "than", "this", "that", "the", "their", "then", "those", "to", "told", "too", "true", "try", "until", "url", "us", "were", "when", "whether", "while", "with", "within", "yes", "you", "youll", ]

Public Instance methods

Return a word hash without extra punctuation or short symbols, just stemmed words

Removes common punctuation symbols, returning a new string. E.g.,

  "Hello (greeting's), with {braces} < >...?".without_punctuation
  => "Hello  greetings   with  braces         "

Return a Hash of strings => ints. Each word in the string is stemmed, interned, and indexes to its frequency in the document.

[Validate]