README.md in doc_sim-0.1.0 vs README.md in doc_sim-0.1.1
- old
+ new
@@ -1,19 +1,17 @@
-# Document Similarity - Efficient probablistic algorithm for calculating document similarity
+# Doc Sim - Efficient algorithm for calculating approximate document similarity
A Ruby implementation of [Mining of Massive Datasets](http://www.mmds.org/)'s document similarity algorithm. It uses Minhash and Localitiy Sensitive Hashing to efficiently find documents with a high probability of being similar.
## Installation
-TODO: Replace `UPDATE_WITH_YOUR_GEM_NAME_PRIOR_TO_RELEASE_TO_RUBYGEMS_ORG` with your gem name right after releasing it to RubyGems.org. Please do not do it earlier due to security reasons. Alternatively, replace this section with instructions to install your gem from git if you don't plan to release to RubyGems.org.
-
Install the gem and add to the application's Gemfile by executing:
- $ bundle add UPDATE_WITH_YOUR_GEM_NAME_PRIOR_TO_RELEASE_TO_RUBYGEMS_ORG
+ $ bundle add doc_sim
If bundler is not being used to manage dependencies, install the gem by executing:
- $ gem install UPDATE_WITH_YOUR_GEM_NAME_PRIOR_TO_RELEASE_TO_RUBYGEMS_ORG
+ $ gem install doc_sim
## Usage
1. Shingle your documents using `Shingling.shingle` (k-shingling). The optimal k value differs based on the type of argument, but 5 is a good first guess.
2. Initialize a `Minhash::Minhash`.