Sha256: d228fc010a2b8399c1d627d640be4254a256887c2bbca3550ac67626c59ac5f3

Contents?: true

Size: 598 Bytes

Versions: 4

Compression:

Stored size: 598 Bytes

Contents

module Company
  module Mapping
    # Raw term frequency (number of times a token appears in a given string - document)
    class TermFrequency

      def initialize(tokenizer)
        @tokenizer = tokenizer
      end

      #Calculates the raw term frequency given the contents of the document.
      def calculate(text)
        rawFrequency(text)
      end

      protected
      def rawFrequency(contents)
        @tokenizer.tokenize(contents).each_with_object({}) do |token, tf|
          tf[token] ||= 0
          tf[token] += 1
        end
      end
    end
  end
end

Version data entries

4 entries across 4 versions & 1 rubygems

Version Path
company-mapping-0.2.3 lib/company/mapping/tfidf/tf/term_frequency.rb
company-mapping-0.2.2 lib/company/mapping/tfidf/tf/term_frequency.rb
company-mapping-0.2.1 lib/company/mapping/tfidf/tf/term_frequency.rb
company-mapping-0.2.0 lib/company/mapping/tfidf/tf/term_frequency.rb