README.md in eps-0.1.0 vs README.md in eps-0.1.1
- old
+ new
@@ -4,11 +4,14 @@
- Build models quickly and easily
- Serve models built in Ruby, Python, R, and more
- Automatically handles categorical variables
- No external dependencies
+- Works great with the SciRuby ecosystem (Daru & IRuby)
+[![Build Status](https://travis-ci.org/ankane/eps.svg?branch=master)](https://travis-ci.org/ankane/eps)
+
## Installation
Add this line to your application’s Gemfile:
```ruby
@@ -56,10 +59,18 @@
```ruby
split_date = Date.parse("2018-06-01")
train_set, test_set = houses.partition { |h| h.sold_at < split_date }
```
+### Outliers and Missing Data
+
+Next, decide what to do with outliers and missing data. There are a number of methods for handling them, but the easiest is to remove them.
+
+```ruby
+train_set.reject! { |h| h.bedrooms.nil? || h.price < 10000 }
+```
+
### Feature Engineering
Selecting features for a model is extremely important for performance. Features can be numeric or categorical. For categorical features, there’s no need to create dummy variables - just pass the data as strings.
```ruby
@@ -85,46 +96,70 @@
```ruby
def features(house)
{
bedrooms: house.bedrooms,
city_id: house.city_id.to_s,
- month: house.sold_at.strftime("%b"),
- price: house.price
+ month: house.sold_at.strftime("%b")
}
end
-train_data = train_set.map { |h| features(h) }
+train_features = train_set.map { |h| features(h) }
```
+> We use a method for features so it can be used across training, evaluation, and prediction
+
+We also need to prepare the target variable.
+
+```ruby
+def target(house)
+ house.price
+end
+
+train_target = train_set.map { |h| target(h) }
+```
+
### Training
-Once we have some features, let’s train the model.
+Now, let’s train the model.
```ruby
-model = Eps::Regressor.new(train_data, target: :price)
+model = Eps::Regressor.new(train_features, train_target)
puts model.summary
```
The summary includes the coefficients and their significance. The lower the p-value, the more significant the feature is. p-values below 0.05 are typically considered significant. It also shows the adjusted r-squared, which is a measure of how well the model fits the data. The higher the number, the better the fit. Here’s a good explanation of why it’s [better than r-squared](https://www.quora.com/What-is-the-difference-between-R-squared-and-Adjusted-R-squared).
### Evaluation
When you’re happy with the model, see how well it performs on the test set. This gives us an idea of how well it’ll perform on unseen data.
```ruby
-test_data = test_set.map { |h| features(h) }
-model.evaluate(test_data)
+test_features = test_set.map { |h| features(h) }
+test_target = test_set.map { |h| target(h) }
+model.evaluate(test_features, test_target)
```
This returns:
-- RSME - Root mean square error
+- RMSE - Root mean square error
- MAE - Mean absolute error
- ME - Mean error
We want to minimize the RMSE and MAE and keep the ME around 0.
+### Finalize
+
+Now that we have an idea of how the model will perform, we want to retrain the model with all of our data.
+
+```ruby
+all_features = houses.map { |h| features(h) }
+all_target = houses.map { |h| target(h) }
+model = Eps::Regressor.new(all_features, all_target)
+```
+
+We now have a model that’s ready to serve.
+
## Serving Models
Once the model is trained, all we need are the coefficients to make predictions. You can dump them as a Ruby object or JSON. For Ruby, use:
```ruby
@@ -176,10 +211,12 @@
```ruby
data = File.read("model.pmml")
model = Eps::Regressor.load_pmml(data)
```
+> Loading PMML requires Nokogiri to be installed
+
[PFA](http://dmg.org/pfa/) - Portable Format for Analytics
```ruby
data = File.read("model.pfa")
model = Eps::Regressor.load_pfa(data)
@@ -323,9 +360,13 @@
When importing data from CSV files, be sure to convert numeric fields. The `table` method does this automatically.
```ruby
CSV.table("data.csv").map { |row| row.to_h }
```
+
+## Jupyter & IRuby
+
+You can use [IRuby](https://github.com/SciRuby/iruby) to run Eps in [Jupyter](https://jupyter.org/) notebooks. Here’s how to get [IRuby working with Rails](https://github.com/ankane/shorts/blob/master/Jupyter-Rails.md).
## Reference
Get coefficients