Introduction ======== Rseg is a Chinese Word Segmentation(中文分词) routine in pure Ruby. The algorithm is based on this article: http://xiecc.blog.163.com/blog/static/14032200671110224190/ Usage ======== It's very easy to use: > require 'rubygems' > require 'rseg' > RSeg.segment("需要分词的文章") ['需要', '分词', '的', '文章'] The first call to Rseg#segment will need about 30 seconds to load the dictionary, the second call will be very fast. Performance ======== About 5M character/s on my Macbook (Intel Core 2 Duo 2GHz/4G mem). License ======== Rseg includes two built-in dictionaries: * CC-CEDICT (http://cc-cedict.org/wiki/) with Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/) * Wikipedia Chinese article title list (http://download.wikimedia.org/zhwiki/) with Creative Commons Attribution-Share Alike 3.0 License(http://creativecommons.org/licenses/by-sa/3.0/) The codes and others in Rseg are licensed under MIT license. Feedback ======== All feedback are welcome, Yuanyi Zhang(zhangyuanyi#gmail.com)