Sha256: b741a7e3e3ef19c791167dfb7f372aa53b31c84bc4be1f693dc0d0d9a83647eb

Contents?: true

Size: 531 Bytes

Versions: 3

Compression:

Stored size: 531 Bytes

Contents

require "#{File.dirname(__FILE__)}/document"

class Corpus

  def initialize
    @tokens = {}
  end

  def entry_count
    @tokens.values.inject(0, :+)
  end

  def add document
    document.each_token do |token|
      @tokens[token] = token_count(token) + 1
    end
  end

  def load_from_directory directory
    Dir.glob("#{directory}/*.txt") do |entry|
      IO.foreach(entry, encoding: Encoding::UTF_8) do |line|
        add Document.new(line)
      end
    end
  end

  def token_count token
    @tokens[token] || 0
  end
end

Version data entries

3 entries across 3 versions & 1 rubygems

Version Path
sentimentalizer-0.3.2 lib/engine/corpus.rb
sentimentalizer-0.3.1 lib/engine/corpus.rb
sentimentalizer-0.3.0 lib/engine/corpus.rb