Sha256: d66448739447ebd3baa378367a29c9df1de420099ec5eb704d16d579bdab9637

Contents?: true

Size: 1.22 KB

Versions: 1

Compression:

Stored size: 1.22 KB

Contents

= Groupie

Groupie is a simple way to group texts and classify new texts as being a likely member of one of the defined groups. Think of bayesian spam filters.

The eventual goal is to have Groupie work as a sort of bayesian spam filter, where you feed it spam and ham (non-spam) and ask it to classify new texts as spam or ham. Applications for this are e-mail spam filtering and blog spam filtering. Other sorts of categorizing might be interesting as well, such as finding suitable tags for a blog post or bookmark.

== Goals 

Groupie is a 'fun' project that has the following goals, in descending order of importance:
* Have fun playing with code
* Play with Bayesian-like (spam) filtering
* Check out the Testy BDD framework. It's pretty good for 60 lines of code!

== Current functionality

Current funcionality includes:
* Tokenize an input text to prepare it for grouping.
  * Strip XML and HTML tag.
  * Keep certain infix characters, such as period and comma.
* Add texts (as an Array of Strings) to any number of groups.
* Classify a single word to check the likelihood it belongs to each group.
* Do classification for complete (tokenized) texts.

== License

As always, the code is licensed under the MIT license.

Wes Oldenbeuving

Version data entries

1 entries across 1 versions & 1 rubygems

Version Path
groupie-0.1.0 readme.rdoc