Sha256: 69216cd40ebb45b14a02bfae7965eb798017fa5612b3b0c7cffb2da7d16482c4

Contents?: true

Size: 431 Bytes

Versions: 3

Compression:

Stored size: 431 Bytes

Contents

require File.expand_path('../language-detector', __FILE__)

TWEETS_FILENAME = "datasets/tweets_5000.txt"

training_sentences = File.readlines(TWEETS_FILENAME).map{ |tweet| tweet.normalize }
detector = LanguageDetector.new(:ngram_size => 3)
detector.train(30, training_sentences)
detector.yamlize("detector.yaml")

puts detector.classifier.get_prior_category_probability(0)
puts detector.classifier.get_prior_category_probability(1)

Version data entries

3 entries across 3 versions & 1 rubygems

Version Path
unsupervised-language-detection-0.0.4 lib/unsupervised-language-detection/train-english-tweet-detector.rb
unsupervised-language-detection-0.0.3 lib/unsupervised-language-detection/train-english-tweet-detector.rb
unsupervised-language-detection-0.0.2 lib/unsupervised-language-detection/train-english-tweet-detector.rb