README.md in svmkit-0.4.0 vs README.md in svmkit-0.4.1

- old
+ new

@@ -6,12 +6,12 @@ [![BSD 2-Clause License](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](https://github.com/yoshoku/SVMKit/blob/master/LICENSE.txt) SVMKit is a machine learninig library in Ruby. SVMKit provides machine learning algorithms with interfaces similar to Scikit-Learn in Python. SVMKit currently supports Linear / Kernel Support Vector Machine, -Logistic Regression, Ridge, Lasso, Factorization Machine, Naive Bayes, Decision Tree, Random Forest, -K-nearest neighbor classifier, and cross-validation. +Logistic Regression, Linear Regression, Ridge, Lasso, Factorization Machine, +Naive Bayes, Decision Tree, Random Forest, K-nearest neighbor classifier, and cross-validation. ## Installation Add this line to your application's Gemfile: @@ -27,64 +27,100 @@ $ gem install svmkit ## Usage -Training phase: +### Example 1. Pendigits dataset classification +SVMKit provides function loading libsvm format dataset file. +We start by downloading the pendigits dataset from LIBSVM Data web site. + +```bash +$ wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/pendigits +$ wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/pendigits.t +``` + +Training of the classifier with Linear SVM and RBF kernel feature map is the following code. + ```ruby require 'svmkit' +# Load the training dataset. samples, labels = SVMKit::Dataset.load_libsvm_file('pendigits') -normalizer = SVMKit::Preprocessing::MinMaxScaler.new -normalized = normalizer.fit_transform(samples) +# If the features consists only of integers, load_libsvm_file method reads in Numo::Int32 format. +# As necessary, you should convert sample array to Numo::DFloat format. +samples = Numo::DFloat.cast(samples) -transformer = SVMKit::KernelApproximation::RBF.new(gamma: 2.0, n_components: 1024, random_seed: 1) -transformed = transformer.fit_transform(normalized) +# Map training data to RBF kernel feature space. +transformer = SVMKit::KernelApproximation::RBF.new(gamma: 0.0001, n_components: 1024, random_seed: 1) +transformed = transformer.fit_transform(samples) -classifier = SVMKit::LinearModel::SVC.new(reg_param: 1.0, max_iter: 1000, batch_size: 20, random_seed: 1) +# Train linear SVM classifier. +classifier = SVMKit::LinearModel::SVC.new(reg_param: 0.0001, max_iter: 1000, batch_size: 50, random_seed: 1) classifier.fit(transformed, labels) -File.open('trained_normalizer.dat', 'wb') { |f| f.write(Marshal.dump(normalizer)) } -File.open('trained_transformer.dat', 'wb') { |f| f.write(Marshal.dump(transformer)) } -File.open('trained_classifier.dat', 'wb') { |f| f.write(Marshal.dump(classifier)) } +# Save the model. +File.open('transformer.dat', 'wb') { |f| f.write(Marshal.dump(transformer)) } +File.open('classifier.dat', 'wb') { |f| f.write(Marshal.dump(classifier)) } ``` -Testing phase: +Classifying testing data with the trained classifier is the following code. ```ruby require 'svmkit' +# Load the testing dataset. samples, labels = SVMKit::Dataset.load_libsvm_file('pendigits.t') +samples = Numo::DFloat.cast(samples) -normalizer = Marshal.load(File.binread('trained_normalizer.dat')) -transformer = Marshal.load(File.binread('trained_transformer.dat')) -classifier = Marshal.load(File.binread('trained_classifier.dat')) +# Load the model. +transformer = Marshal.load(File.binread('transformer.dat')) +classifier = Marshal.load(File.binread('classifier.dat')) -normalized = normalizer.transform(samples) -transformed = transformer.transform(normalized) +# Map testing data to RBF kernel feature space. +transformed = transformer.transform(samples) -puts(sprintf("Accuracy: %.1f%%", 100.0 * classifier.score(transformed, labels))) +# Classify the testing data and evaluate prediction results. +puts("Accuracy: %.1f%%" % (100.0 * classifier.score(transformed, labels))) + +# Other evaluating approach +# results = classifier.predict(transformed) +# evaluator = SVMKit::EvaluationMeasure::Accuracy.new +# puts("Accuracy: %.1f%%" % (100.0 * evaluator.score(results, labels))) ``` -5-fold cross-validation: +Execution of the above scripts result in the following. +```bash +$ ruby train.rb +$ ruby test.rb +Accuracy: 98.4% +``` + +### Example 2. Cross-validation + ```ruby require 'svmkit' +# Load dataset. samples, labels = SVMKit::Dataset.load_libsvm_file('pendigits') +samples = Numo::DFloat.cast(samples) -kernel_svc = SVMKit::KernelMachine::KernelSVC.new(reg_param: 1.0, max_iter: 1000, random_seed: 1) +# Define the estimator to be evaluated. +lr = SVMKit::LinearModel::LogisticRegression.new(reg_param: 0.0001, random_seed: 1) +# Define the evaluation measure, splitting strategy, and cross validation. +ev = SVMKit::EvaluationMeasure::LogLoss.new kf = SVMKit::ModelSelection::StratifiedKFold.new(n_splits: 5, shuffle: true, random_seed: 1) -cv = SVMKit::ModelSelection::CrossValidation.new(estimator: kernel_svc, splitter: kf) +cv = SVMKit::ModelSelection::CrossValidation.new(estimator: lr, splitter: kf, evaluator: ev) -kernel_mat = SVMKit::PairwiseMetric::rbf_kernel(samples, nil, 0.005) -report = cv.perform(kernel_mat, labels) +# Perform 5-cross validation. +report = cv.perform(samples, labels) -mean_accuracy = report[:test_score].inject(:+) / kf.n_splits -puts(sprintf("Mean Accuracy: %.1f%%", 100.0 * mean_accuracy)) +# Output result. +mean_logloss = report[:test_score].inject(:+) / kf.n_splits +puts("5-CV mean log-loss: %.3f" % mean_logloss) ``` ## Development After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.