README.md in groupie-0.4.1 vs README.md in groupie-0.5.0

- old
+ new

@@ -1,7 +1,9 @@ # Groupie +[![Depfu](https://badges.depfu.com/badges/367956233b3b31a6fc19db4515263b9e/overview.svg)](https://depfu.com/github/Narnach/groupie?project_id=34004) + Groupie is a simple way to group texts and classify new texts as being a likely member of one of the defined groups. Think of bayesian spam filters. The eventual goal is to have Groupie work as a sort of bayesian spam filter, where you feed it spam and ham (non-spam) and ask it to classify new texts as spam or ham. Applications for this are e-mail spam filtering and blog spam filtering. Other sorts of categorizing might be interesting as well, such as finding suitable tags for a blog post or bookmark. Started and forgotten in 2009 as a short-lived experiment, in 2010 Groupie got new features when I started using it on a RSS reader project that classified news items into "Interesting" and "Not interesting" categories. @@ -88,10 +90,25 @@ # This looks even worse for our poor password reset email. # In case you're curious, the ignored words in this case are: test_tokens - (test_tokens & groupie.unique_words) # => ["please", "to", "reset", "awesome"] # If you'd be classifying email, you can assume that common email headers will get ignored this way. + +# If you're just starting out, your incomplete data could lead to dramatic misrepresentations of the data. +# To balance against this, you can enable smart weight: +groupie.smart_weight = true +# You could also set it during initialization via Groupie.new(smart_weight: true) +# What's so useful about it? It adds a default weight to _all_ words, even the ones you haven't +# seen yet, which counter-acts the data you have. This shines in low data situations, +# reducing the impact of the few words you have seen before. +groupie.default_weight +# => 1.2285714285714286 +# Classifying the same text as before should consider all words, and add this default weight to all words +# It basically gives all groups the likelihood of "claiming" a word, +# unless there is strong data to suggest otherwise. +groupie.classify_text(test_tokens) +# => {:spam=>0.5241046831955923, :ham=>0.4758953168044077} ``` Persistence can be naively done by using YAML: ```ruby @@ -108,10 +125,12 @@ ## Development After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment. Rubocop is available via `bin/rubocop` with some friendly default settings. -To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org). +To install this gem onto your local machine, run `bundle exec rake install`. + +To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org). For obvious reasons, only the project maintainer can do this. ## Contributing Bug reports and pull requests are welcome on GitHub at https://github.com/Narnach/groupie.