README.md in anomaly-0.0.2 vs README.md in anomaly-0.0.3
- old
+ new
@@ -14,55 +14,62 @@
```sh
bundle install
```
-For max performance (about 3x faster), also install the NArray gem:
+For max performance (~ 2x faster), also install the NArray gem:
```ruby
gem "narray"
```
Anomaly will automatically detect it and use it.
## How to Use
-Train the detector with **only non-anomalies**. Each row is a sample.
+Say we have weather data for sunny days and we're trying to detect days that aren't sunny. The data looks like:
```ruby
-train_data = [
- [0.1, 100, 1.4],
- [0.2, 101, 2.1],
- [0.5, 102, 1.6]
+# Each row is a different day.
+# [temperature (°F), humidity (%), pressure (in)]
+weather_data = [
+ [85, 68, 10.4],
+ [88, 62, 12.1],
+ [86, 64, 13.6],
+ ...
]
-ad = Anomaly::Detector.new(train_data)
```
+Train the detector with **only non-anomalies** (sunny days in our case).
+
+```ruby
+ad = Anomaly::Detector.new(weather_data)
+```
+
That's it! Let's test for anomalies.
```ruby
-test_sample = [1.0, 100, 1.4]
+# 79°F, 66% humidity, 12.3 in. pressure
+test_sample = [79, 66, 12.3]
ad.probability(test_sample)
-# => 0.0007328491480297603
+# => 7.537174740907633e-08
```
**Super-important:** You must select a threshold for anomalies (which we denote with ε - "epsilon")
Probabilities less than ε are considered anomalies. If ε is higher, more things are considered anomalies.
``` ruby
ad.anomaly?(test_sample, 1e-10)
# => false
-ad.anomaly?(test_sample, 0.5)
+ad.anomaly?(test_sample, 1e-5)
# => true
```
-Here's sample to code to help you find the best ε for your application.
+The wiki has [sample code](https://github.com/ankane/anomaly/wiki/Home) to help you find the best ε for your application.
-```ruby
-# TODO
-```
+### Persistence
You can easily persist the detector to a file or database - it's very tiny.
```ruby
serialized_ad = Marshal.dump(ad)
@@ -73,9 +80,14 @@
# ...
# Read it later
ad2 = Marshal.load(File.open("anomaly_detector.dump", "r").read)
```
+
+## TODO
+
+- Train in chunks (for very large datasets)
+- Multivariate normal distribution (possibly)
## Contributing
1. Fork it
2. Create your feature branch (`git checkout -b my-new-feature`)