Sha256: 8c977e6c059effe374885265665f1a48709ecbcc132baa54caa874b5a8957699
Contents?: true
Size: 1.5 KB
Versions: 4
Compression:
Stored size: 1.5 KB
Contents
Latent Dirichlet Allocation – Ruby Wrapper This wrapper is based on C-code by David M. Blei. In a nutshell, it can be used to automatically cluster documents into topics. The number of topics are chosen beforehand and the topics found are usually fairly intuitive. Details of the implementation can be found in the paper by Blei, Ng, and Jordan. The original C code relied on files for the input and output. We felt it was necessary to depart from that model and use Ruby objects for these steps instead. The only file necessary will be the data file (in a format similar to that used by SVMlight). Optionally you may need a vocabulary file to be able to extract the words belonging to topics. Example usage: require 'lda' lda = Lda::Lda.new # create an Lda object for training corpus = Lda::Corpus.new("data/data_file.dat") lda.corpus = corpus lda.em("random") # run EM algorithm using random starting points lda.load_vocabulary("data/vocab.txt") lda.print_topics(20) # print the topic 20 words per topic See the rdocs for further information. You can also check out the mailing list for this project if you have any questions or mail lda-ruby@groups.google.com [email link]. If you have general questions about Latent Dirichlet Allocation, I urge you to use the topic models mailing list, since the people who monitor that are very knowledgeable. References Blei, David M., Ng, Andrew Y., and Jordan, Michael I. 2003. Latent dirichlet allocation. Journal of Machine Learning Research. 3 (Mar. 2003), 993-1022.
Version data entries
4 entries across 4 versions & 1 rubygems
Version | Path |
---|---|
ealdent-lda-ruby-0.2.0 | README |
ealdent-lda-ruby-0.2.1 | README |
ealdent-lda-ruby-0.2.2 | README |
ealdent-lda-ruby-0.2.3 | README |