Sha256: 5ed5f4b6d0430a7b8b01cb6ffcb4f19eebf8b09d45e583ec16858501993c94d7

Contents?: true

Size: 570 Bytes

Versions: 5

Compression:

Stored size: 570 Bytes

Contents

require 'iconv'

module Lunar
  # @private Internally used to determine the words given some str.
  # i.e. Words.new("the quick brown") == %w(the quick brown)
  class Words < Array
    SEPARATOR = /\s+/

    def initialize(str)
      words = str.split(SEPARATOR).
        reject { |w| w.to_s.strip.empty? }.
        map    { |w| sanitize(w) }.
        reject { |w| Stopwords.include?(w) }

      super(words)
    end

  private
    def sanitize(str)
      Iconv.iconv('UTF-8//IGNORE', 'UTF-8', str)[0].to_s.
        gsub(/[^a-zA-Z0-9\-_]/, '').downcase
    end
  end
end

Version data entries

5 entries across 5 versions & 1 rubygems

Version Path
lunar-0.5.4 lib/lunar/words.rb
lunar-0.5.3 lib/lunar/words.rb
lunar-0.5.2 lib/lunar/words.rb
lunar-0.5.1 lib/lunar/words.rb
lunar-0.5.0 lib/lunar/words.rb