README.md in groupie-0.4.1 vs README.md in groupie-0.5.0
- old
+ new
@@ -1,7 +1,9 @@
# Groupie
+[![Depfu](https://badges.depfu.com/badges/367956233b3b31a6fc19db4515263b9e/overview.svg)](https://depfu.com/github/Narnach/groupie?project_id=34004)
+
Groupie is a simple way to group texts and classify new texts as being a likely member of one of the defined groups. Think of bayesian spam filters.
The eventual goal is to have Groupie work as a sort of bayesian spam filter, where you feed it spam and ham (non-spam) and ask it to classify new texts as spam or ham. Applications for this are e-mail spam filtering and blog spam filtering. Other sorts of categorizing might be interesting as well, such as finding suitable tags for a blog post or bookmark.
Started and forgotten in 2009 as a short-lived experiment, in 2010 Groupie got new features when I started using it on a RSS reader project that classified news items into "Interesting" and "Not interesting" categories.
@@ -88,10 +90,25 @@
# This looks even worse for our poor password reset email.
# In case you're curious, the ignored words in this case are:
test_tokens - (test_tokens & groupie.unique_words)
# => ["please", "to", "reset", "awesome"]
# If you'd be classifying email, you can assume that common email headers will get ignored this way.
+
+# If you're just starting out, your incomplete data could lead to dramatic misrepresentations of the data.
+# To balance against this, you can enable smart weight:
+groupie.smart_weight = true
+# You could also set it during initialization via Groupie.new(smart_weight: true)
+# What's so useful about it? It adds a default weight to _all_ words, even the ones you haven't
+# seen yet, which counter-acts the data you have. This shines in low data situations,
+# reducing the impact of the few words you have seen before.
+groupie.default_weight
+# => 1.2285714285714286
+# Classifying the same text as before should consider all words, and add this default weight to all words
+# It basically gives all groups the likelihood of "claiming" a word,
+# unless there is strong data to suggest otherwise.
+groupie.classify_text(test_tokens)
+# => {:spam=>0.5241046831955923, :ham=>0.4758953168044077}
```
Persistence can be naively done by using YAML:
```ruby
@@ -108,10 +125,12 @@
## Development
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment. Rubocop is available via `bin/rubocop` with some friendly default settings.
-To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
+To install this gem onto your local machine, run `bundle exec rake install`.
+
+To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org). For obvious reasons, only the project maintainer can do this.
## Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/Narnach/groupie.