Sha256: 3cc6f52b71123a04d040a9aa130e419e7c3aa14468c4011128b281d8442ac6cc

Contents?: true

Size: 689 Bytes

Versions: 3

Compression:

Stored size: 689 Bytes

Contents

require 'raingrams/tokens/unknown'

module Raingrams
  module OpenVocabulary
    module OpenModel

      # The fixed lexicon of this model
      attr_reader :lexicon

      def initialize(options={},&block)
        @lexicon = (options[:lexicon] || [])

        @lexicon.map! do |word|
          word.to_gram
        end

        super(options,&block)
      end

      def within_lexicon?(gram)
        @lexicon.include?(gram.to_gram)
      end

      def train_ngram(ngram)
        ngram = ngram.map do |gram|
          if within_lexicon?(gram)
            gram
          else
            Tokens::Unknown
          end
        end

        return super(ngram)
      end

    end
  end
end

Version data entries

3 entries across 3 versions & 1 rubygems

Version Path
raingrams-0.1.0 lib/raingrams/open_vocabulary/open_model.rb
raingrams-0.1.1 lib/raingrams/open_vocabulary/open_model.rb
raingrams-0.1.2 lib/raingrams/open_vocabulary/open_model.rb