Sha256: 5a4271751ff03d0d2b43ee7c219eba60188eeef4b2a0d83f1945d93065ac955e
Contents?: true
Size: 784 Bytes
Versions: 2
Compression:
Stored size: 784 Bytes
Contents
require 'yaml' module Lda class Document attr_reader :corpus, :words, :counts, :length, :total, :tokens def initialize(corpus) @corpus = corpus @words = Array.new @counts = Array.new @tokens = Array.new @length = 0 @total = 0 end # # Recompute the total and length values. # def recompute @total = @counts.inject(0) { |sum, i| sum + i } @length = @words.size end def has_text? false end def handle(tokens) tokens end def tokenize(text) clean_text = text.gsub(/[^A-Za-z'\s]+/, ' ').gsub(/\s+/, ' ').downcase # remove everything but letters and ' and leave only single spaces @tokens = handle(clean_text.split(' ')) nil end end end
Version data entries
2 entries across 2 versions & 1 rubygems
Version | Path |
---|---|
lda-ruby-0.3.7 | lib/lda-ruby/document/document.rb |
lda-ruby-0.3.6 | lib/lda-ruby/document/document.rb |