README.md in anomaly-0.0.2 vs README.md in anomaly-0.0.3

- old
+ new

@@ -14,55 +14,62 @@ ```sh bundle install ``` -For max performance (about 3x faster), also install the NArray gem: +For max performance (~ 2x faster), also install the NArray gem: ```ruby gem "narray" ``` Anomaly will automatically detect it and use it. ## How to Use -Train the detector with **only non-anomalies**. Each row is a sample. +Say we have weather data for sunny days and we're trying to detect days that aren't sunny. The data looks like: ```ruby -train_data = [ - [0.1, 100, 1.4], - [0.2, 101, 2.1], - [0.5, 102, 1.6] +# Each row is a different day. +# [temperature (°F), humidity (%), pressure (in)] +weather_data = [ + [85, 68, 10.4], + [88, 62, 12.1], + [86, 64, 13.6], + ... ] -ad = Anomaly::Detector.new(train_data) ``` +Train the detector with **only non-anomalies** (sunny days in our case). + +```ruby +ad = Anomaly::Detector.new(weather_data) +``` + That's it! Let's test for anomalies. ```ruby -test_sample = [1.0, 100, 1.4] +# 79°F, 66% humidity, 12.3 in. pressure +test_sample = [79, 66, 12.3] ad.probability(test_sample) -# => 0.0007328491480297603 +# => 7.537174740907633e-08 ``` **Super-important:** You must select a threshold for anomalies (which we denote with ε - "epsilon") Probabilities less than ε are considered anomalies. If ε is higher, more things are considered anomalies. ``` ruby ad.anomaly?(test_sample, 1e-10) # => false -ad.anomaly?(test_sample, 0.5) +ad.anomaly?(test_sample, 1e-5) # => true ``` -Here's sample to code to help you find the best ε for your application. +The wiki has [sample code](https://github.com/ankane/anomaly/wiki/Home) to help you find the best ε for your application. -```ruby -# TODO -``` +### Persistence You can easily persist the detector to a file or database - it's very tiny. ```ruby serialized_ad = Marshal.dump(ad) @@ -73,9 +80,14 @@ # ... # Read it later ad2 = Marshal.load(File.open("anomaly_detector.dump", "r").read) ``` + +## TODO + +- Train in chunks (for very large datasets) +- Multivariate normal distribution (possibly) ## Contributing 1. Fork it 2. Create your feature branch (`git checkout -b my-new-feature`)