readme.rdoc in groupie-0.2.2 vs readme.rdoc in groupie-0.3.0
- old
+ new
@@ -2,24 +2,21 @@
Groupie is a simple way to group texts and classify new texts as being a likely member of one of the defined groups. Think of bayesian spam filters.
The eventual goal is to have Groupie work as a sort of bayesian spam filter, where you feed it spam and ham (non-spam) and ask it to classify new texts as spam or ham. Applications for this are e-mail spam filtering and blog spam filtering. Other sorts of categorizing might be interesting as well, such as finding suitable tags for a blog post or bookmark.
-== Goals
+Started and forgotten in 2009 as a short-lived experiment, in 2010 Groupie got new features when I started using it on a RSS reader project that classified news items into "Interesting" and "Not interesting" categories.
-Groupie is a 'fun' project that has the following goals, in descending order of importance:
-* Have fun playing with code
-* Play with Bayesian-like (spam) filtering
-
== Current functionality
Current funcionality includes:
* Tokenize an input text to prepare it for grouping.
* Strip XML and HTML tag.
* Keep certain infix characters, such as period and comma.
* Add texts (as an Array of Strings) to any number of groups.
* Classify a single word to check the likelihood it belongs to each group.
* Do classification for complete (tokenized) texts.
+* Pick classification strategy to weigh repeat words differently (weigh by sum, square root or log10 of words in group)
== License
As always, the code is licensed under the MIT license.
\ No newline at end of file