README.md in yanbi-ml-0.1.2 vs README.md in yanbi-ml-0.2.0

- old
+ new

@@ -1,8 +1,8 @@
 # YANBI-ML
 
-Yet Another Naive Bayes Implementation
+Yet Another Naive Bayes Implementation - Bayes and Fisher document classifiers
 
 ## Installation
 
 Add this line to your application's Gemfile:
 
@@ -32,13 +32,31 @@
 classifier.train_raw(:odd, "one three five seven")
 
 classifier.classify_raw("one two three") => :odd
 ```
 
+## What is a Fisher Classifier?
+
+An alternative to the standard Bayesian classifier that can also give very accurate results.  A Bayesian classifier works by computing a single, document-wide probability for each class that a document might belong to.  A Fisher classifer, by contrast, will compute a probability for each individual feature in a document.  If the document does not belong to a given class, then you would expect to get a random distribution of probabilities for the features in the document.  In fact, the eponymous Fisher showed that you would generally get a *chi squared distribution* of probabilities.  If the document does belong to a given class, you would expect the probabilities to be skewed towards higher probabilities, instead of being randomly distributed. A Fisher classifier uses the Fisher statistical method (p-value) to determine the degree to which the features in the document diverge from the expected random probability. 
+
+## I don't care, I just want to use it!
+
+Fortunately the interface is pretty consistent:
+
+```ruby
+classifier = Yanbi::Fisher.default(:even, :odd)
+classifier.train_raw(:even, "two four six eight")
+classifier.train_raw(:odd, "one three five seven")
+
+classifier.classify_raw("one two three") => :odd
+```
+
+See?  Easy.
+
 ## Bags (of words)
 
-A bag of words is a just a Hash of word counts (a multi-set of word frequencies, to ML folk).  This makes a useful abstraction because you can use it with more than one kind of classifier, and because the bag provides a natural location for various kinds of pre-processing you might want to do to the words (features) of the text before training with or classifying them.  
+A bag of words is a just a Hash of word counts (a multi-set of word frequencies, to ML folk).  This makes a useful abstraction because you can use it with more than one kind of classifier, and because the bag provides a natural location for various kinds of pre-processing you might want to do to the words (features) of the text before training with or classifying them.  Although a single bag can contain as many documents as you want, in practice it's a good idea to treat word bags as corresponding to a single document.
 
 A handful of classes are provided:
 
 <ul>
 <li>WordBag - basic, default bag of words</li>
@@ -161,10 +179,45 @@
 docs.each_doc do |d|
   d.remove(STOP_WORDS)
 end
 ```
 
+## Feature thresholds 
+
+A method on the classifier is provided to prune infrequently seen features.  This is often one of the first things recommended for improving the accuracy of a classifier in real world applications.  Note that when you prune features, there's no un-pruning afterwards - so be sure you actually want to do it!
+
+
+```ruby
+classifier = Yanbi.default(:even, :odd)
+
+#...tons of training happens here...
+
+#we now have thousands of documents.  Ignore any words we haven't
+#seen at least a dozen times
+
+classifier.set_significance(12)
+
+#actually, the 'odd' category is especially noisy, so let's make
+#that two dozen for odd items
+
+classifier.set_significance(24, :odd)
+```
+
+## Persisting
+
+After going to all of the trouble of training a classifier on a large corpus, it would be very useful to save it to disk for later use.  You can do just that with the appropriately named save and load functions:
+
+```ruby
+classifier.save('testclassifier')
+
+#...some time later
+
+newclassifier = Yanbi::Bayes.load('testclassifier')
+```
+
+Note that an .obj extension is added to saved classifiers by default - no need to explicitly include it.
+
 ## Putting it all together
 
 ```ruby
 classifier = Yanbi.default(:stuff, :otherstuff)
 
@@ -174,14 +227,46 @@
 other = Yanbi::Corpus.new
 other.add_file('biglistofotherstuff.txt', '@@@@')
 
 stuff.each_doc {|d| classifier.train(:stuff, d)}
 otherstuff.each_doc {|d| classifier.train(:otherstuff, d)}
+
+#...classify all the things....
 ```
 
+A slightly fancier example:
+
+```ruby
+
+STOP_WORDS = %w(in the a and at of)
+
+#classify using stemmed words
+classifier = Yanbi::Bayes.new(Yanbi::StemmedWordBag, :stuff, :otherstuff)
+
+#create our corpora
+stuff = Yanbi::Corpus.new(Yanbi::StemmedWordBag)
+stuff.add_file('biglistofstuff.txt', '****')
+
+other = Yanbi::Corpus.new(Yanbi::StemmedWordBag)
+other.add_file('biglistofotherstuff.txt', '@@@@')
+
+#get rid of those nasty stop words
+stuff.each_doc {|d| d.remove(STOP_WORDS}
+otherstuff.each_doc {|d| d.remove(STOP_WORDS}
+
+#train away!
+stuff.each_doc {|d| classifier.train(:stuff, d)}
+otherstuff.each_doc {|d| classifier.train(:otherstuff, d)}
+
+#get rid of the long tail
+classifier.set_significance(50)
+
+#...classify all the things....
+```
+
 ## Contributing
 
-Bug reports and pull requests are welcome on GitHub at https://github.com/rdormer/yanbi-ml.
+Bug reports, corrections of any tragic mathematical misunderstandings, and pull requests are welcome on GitHub at https://github.com/rdormer/yanbi-ml.
 
 
 ## License
 
 The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).