# LIBMF [LIBMF](https://github.com/cjlin1/libmf) - large-scale sparse matrix factorization - for Ruby Check out [Disco](https://github.com/ankane/disco) for higher-level collaborative filtering [![Build Status](https://travis-ci.org/ankane/libmf.svg?branch=master)](https://travis-ci.org/ankane/libmf) [![Build status](https://ci.appveyor.com/api/projects/status/92fbip1bd8sjd2tj/branch/master?svg=true)](https://ci.appveyor.com/project/ankane/libmf/branch/master) ## Installation Add this line to your application’s Gemfile: ```ruby gem 'libmf' ``` ## Getting Started Prep your data in the format `[row_index, column_index, value]` ```ruby data = [ [0, 0, 5.0], [0, 2, 3.5], [1, 1, 4.0] ] ``` Create a model ```ruby model = Libmf::Model.new model.fit(data) ``` Make predictions ```ruby model.predict(row_index, column_index) ``` Get the bias and latent factors ```ruby model.bias model.p_factors model.q_factors ``` Save the model to a file ```ruby model.save_model("model.txt") ``` Load the model from a file ```ruby model.load_model("model.txt") ``` Pass a validation set ```ruby model.fit(data, eval_set: eval_set) ``` ## Cross-Validation Perform cross-validation ```ruby model.cv(data) ``` Specify the number of folds ```ruby model.cv(data, folds: 5) ``` ## Parameters Pass parameters - default values below ```ruby Libmf::Model.new( loss: 0, # loss function factors: 8, # number of latent factors threads: 12, # number of threads used bins: 25, # number of bins iterations: 20, # number of iterations lambda_p1: 0, # coefficient of L1-norm regularization on P lambda_p2: 0.1, # coefficient of L2-norm regularization on P lambda_q1: 0, # coefficient of L1-norm regularization on Q lambda_q2: 0.1, # coefficient of L2-norm regularization on Q learning_rate: 0.1, # learning rate alpha: 0.1, # importance of negative entries c: 0.0001, # desired value of negative entries nmf: false, # perform non-negative MF (NMF) quiet: false # no outputs to stdout ) ``` ### Loss Functions For real-valued matrix factorization - 0 - squared error (L2-norm) - 1 - absolute error (L1-norm) - 2 - generalized KL-divergence For binary matrix factorization - 5 - logarithmic error - 6 - squared hinge loss - 7 - hinge loss For one-class matrix factorization - 10 - row-oriented pair-wise logarithmic loss - 11 - column-oriented pair-wise logarithmic loss - 12 - squared error (L2-norm) ## Performance For performance, read data directly from files ```ruby model.fit("train.txt", eval_set: "validate.txt") model.cv("train.txt") ``` Data should be in the format `row_index column_index value`: ```txt 0 0 5.0 0 2 3.5 1 1 4.0 ``` ## Numo Get latent factors as Numo arrays ```ruby model.p_factors(format: :numo) model.q_factors(format: :numo) ``` ## Resources - [LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems](https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/libmf_open_source.pdf) ## History View the [changelog](https://github.com/ankane/libmf/blob/master/CHANGELOG.md) ## Contributing Everyone is encouraged to help improve this project. Here are a few ways you can help: - [Report bugs](https://github.com/ankane/libmf/issues) - Fix bugs and [submit pull requests](https://github.com/ankane/libmf/pulls) - Write, clarify, or fix documentation - Suggest or add new features To get started with development: ```sh git clone --recursive https://github.com/ankane/libmf.git cd libmf bundle install bundle exec rake vendor:all bundle exec rake test ```