Sha256: 026f84fa24e76e42b01ba7a9b38a8b32e1ce379e3a1d18f5fd687ee13ca7ee51
Contents?: true
Size: 1.3 KB
Versions: 1
Compression:
Stored size: 1.3 KB
Contents
= Groupie Groupie is a simple way to group texts and classify new texts as being a likely member of one of the defined groups. Think of bayesian spam filters. The eventual goal is to have Groupie work as a sort of bayesian spam filter, where you feed it spam and ham (non-spam) and ask it to classify new texts as spam or ham. Applications for this are e-mail spam filtering and blog spam filtering. Other sorts of categorizing might be interesting as well, such as finding suitable tags for a blog post or bookmark. Started and forgotten in 2009 as a short-lived experiment, in 2010 Groupie got new features when I started using it on a RSS reader project that classified news items into "Interesting" and "Not interesting" categories. == Current functionality Current funcionality includes: * Tokenize an input text to prepare it for grouping. * Strip XML and HTML tag. * Keep certain infix characters, such as period and comma. * Add texts (as an Array of Strings) to any number of groups. * Classify a single word to check the likelihood it belongs to each group. * Do classification for complete (tokenized) texts. * Pick classification strategy to weigh repeat words differently (weigh by sum, square root or log10 of words in group) == License As always, the code is licensed under the MIT license. Wes Oldenbeuving
Version data entries
1 entries across 1 versions & 1 rubygems
Version | Path |
---|---|
groupie-0.3.0 | readme.rdoc |