Sha256: b58f9ce5537aa805b819bd5f83462458998254eff04f222f80fdbb6b428eae0b
Contents?: true
Size: 1.61 KB
Versions: 7
Compression:
Stored size: 1.61 KB
Contents
Introduction ======== Rseg is a Chinese Word Segmentation(中文分词) routine in pure Ruby. The algorithm is based on this article: http://xiecc.blog.163.com/blog/static/14032200671110224190/ Usage ======== Rseg now support two modes: inline and C/S mode. 1. Inline mode > require 'rubygems' > require 'rseg' > Rseg.segment("需要分词的文章") ['需要', '分词', '的', '文章'] The first call to Rseg#segment will need about 30 seconds to load the dictionary, the second call will be very fast, you can also call Rseg#load to load dictionaries manually. 2. C/S mode $ rseg_server == Sinatra/0.9.4 has taken the stage on 4100 This will start rseg server on http://localhost:4100 You can visit it via your browser or the rseg command. $ rseg '需要分词的文章' 需要 分词 的 文章 You can also access server with the Rseg#remote_segment $ irb > require 'rubygems' > require 'rseg' > Rseg.remote_segment("需要分词的文章") # This will be very fast ['需要', '分词', '的', '文章'] Performance ======== About 5M character/s on my Macbook (Intel Core 2 Duo 2GHz/4G mem). License ======== Rseg includes two built-in dictionaries: * CC-CEDICT (http://cc-cedict.org/wiki/) with Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/) * Wikipedia Chinese article title list (http://download.wikimedia.org/zhwiki/) with Creative Commons Attribution-Share Alike 3.0 License(http://creativecommons.org/licenses/by-sa/3.0/) The codes and others in Rseg are licensed under MIT license. Feedback ======== All feedback are welcome, Yuanyi Zhang(zhangyuanyi#gmail.com)
Version data entries
7 entries across 7 versions & 5 rubygems
Version | Path |
---|---|
rseg_harry-0.0.4 | README |
rseg_harry-0.0.3 | README |
rseg_ggharry-0.0.2 | README |
rseg_ggharry-0.0.1 | README |
rseg-ggharry-0.0.1 | README |
rseg1.9-0.1.5 | README |
rseg-0.1.7 | README |