Sha256: 944451be5b2da0f2fb765d27b34fccbae97901645e03b0194efd4aeaefe754f5

Contents?: true

Size: 801 Bytes

Versions: 1

Compression:

Stored size: 801 Bytes

Contents

module Company
  module Mapping

    class TermFrequency

      def initialize(tokenizer)
        @tokenizer = tokenizer
      end

      #Calculates the raw term frequency given the contents of the document.
      def calculate(text)
        return rawFrequency(text)
      end

      def info
        return "Raw term frequency (number of times a token appears in a given string - document)"
      end

      protected
      def rawFrequency(contents)
        _tokens = @tokenizer.tokenize(contents)
        _tf = Hash.new

        _tokens.each {
            |_token|
          if (!_tf.has_key?(_token))
            _tf[_token] = 1
          else
            _tf[_token] = _tf[_token] + 1
          end
        }
        return _tf
      end
    end

  end
end

Version data entries

1 entries across 1 versions & 1 rubygems

Version Path
company-mapping-0.1.0 lib/company/mapping/tfidf/tf/term_frequency.rb