= Sphinx Client API 0.9.10
This document gives an overview of what is Sphinx itself and how to use it
from your Ruby on Rails application. For more information or documentation,
please go to http://www.sphinxsearch.com
== Sphinx
Sphinx is a standalone full-text search engine, meant to provide fast,
size-efficient and relevant fulltext search functions to other applications.
Sphinx was specially designed to integrate well with SQL databases and
scripting languages. Currently built-in data sources support fetching data
either via direct connection to MySQL, or from an XML pipe.
Simplest way to communicate with Sphinx is to use searchd —
a daemon to search through full text indexes from external software.
== Installation
There are two options when approaching sphinx plugin installation:
* using the gem (recommended)
* install as a Rails plugin
To install as a gem, add this to your environment.rb:
config.gem 'sphinx', :source => 'http://gemcutter.org'
And then run the command:
sudo rake gems:install
To install Sphinx as a Rails plugin use this:
script/plugin install git://github.com/kpumuk/sphinx.git
== Documentation
Complete Sphinx plugin documentation could be found here:
http://kpumuk.github.com/sphinx
Also you can find documentation on rdoc.info:
http://rdoc.info/projects/kpumuk/sphinx
You can build the documentation locally by running:
rake yard
Please note: you should have yard gem installed on your system:
sudo gem install yard --source http://gemcutter.org
Complete Sphinx API documentation could be found on Sphinx Search Engine
site: http://www.sphinxsearch.com/docs/current.html
This plugin is fully compatible with original PHP API implementation.
== Ruby naming conventions
Sphinx Client API supports Ruby naming conventions, so every API
method name is in underscored, lowercase form:
SetServer -> set_server
RunQueries -> run_queries
SetMatchMode -> set_match_mode
Every method is aliased to a corresponding one from standard Sphinx
API, so you can use both SetServer and set_server
with no differrence.
There are three exceptions to this naming rule:
GetLastError -> last_error
GetLastWarning -> last_warning
IsConnectError -> connect_error?
Of course, all of them are aliased to the original method names.
== Using multiple Sphinx servers
Since we actively use this plugin in our Scribd development workflow,
there are several methods have been added to accommodate our needs.
You can find documentation on Ruby-specific methods in documentation:
http://rdoc.info/projects/kpumuk/sphinx
First of all, we added support of multiple Sphinx servers to balance
load between them. Also it means that in case of any problems with one
of servers, library will try to fetch the results from another one.
Every consequence request will be executed on the next server in list
(round-robin technique).
sphinx.set_servers([
{ :host => 'browse01.local', :port => 3312 },
{ :host => 'browse02.local', :port => 3312 },
{ :host => 'browse03.local', :port => 3312 }
])
By default library will try to fetch results from a single server, and
fail if it does not respond. To setup number of retries being performed,
you can use second (additional) parameter of the set_connect_timeout
and set_request_timeout methods:
sphinx.set_connect_timeout(1, 3)
sphinx.set_request_timeout(1, 3)
There is a big difference between these two methods. First will affect
only on requests experiencing problems with connection (socket error,
pipe error, etc), second will be used when request is broken somehow
(temporary searchd error, incomplete reply, etc). The workflow looks like
this:
1. Increase retries number. If is less or equal to configured value,
try to connect to the next server. Otherwise, raise an error.
2. In case of connection problem go to 1.
3. Increase request retries number. If it less or equal to configured
value, try to perform request. Otherwise, raise an error.
4. In case of connection problem go to 1.
5. In case of request problem, go to 3.
6. Parse and return response.
Withdrawals:
1. Request could be performed connect_retries * request_retries
times. E.g., it could be tried request_retries times on each
of connect_retries servers (when you have 1 server configured,
but connect_retries is 5, library will try to connect to this
server 5 times).
2. Request could be tried to execute on each server 1..request_retries
times. In case of connection problem, request will be moved to another
server immediately.
Usually you will set connect_retries equal to servers number,
so you will be sure each failing request will be performed on all servers.
This means that if one of servers is live, but others are dead, you request
will be finally executed successfully.
== Sphinx constants
Most Sphinx API methods expecting for special constants will be passed.
For example:
sphinx.set_match_mode(Sphinx::Client::SPH_MATCH_ANY)
Please note that these constants defined in a Sphinx::Client
namespace. You can use symbols or strings instead of these awful
constants:
sphinx.set_match_mode(:any)
sphinx.set_match_mode('any')
== Setting query filters
Every set_ method returns Sphinx::Client object itself.
It means that you can chain filtering methods:
results = Sphinx::Client.new.
set_match_mode(:any).
set_ranking_mode(:bm25).
set_id_range(10, 1000).
query('test')
There is a handful ability to set query parameters directly in query
call. If block does not accept any parameters, it will be eval'ed inside
Sphinx::Client instance:
results = Sphinx::Client.new.query('test') do
match_mode :any
ranking_mode :bm25
id_range 10, 1000
end
As you can see, in this case you can omit the set_ prefix for
this methods. If block accepts a parameter, sphinx instance will be
passed into the block. In this case you should you full method names
including the set_ prefix:
results = Sphinx::Client.new.query('test') do |sphinx|
sphinx.set_match_mode :any
sphinx.set_ranking_mode :bm25
sphinx.set_id_range 10, 1000
end
== Example
This simple example illustrates base connection establishing,
search results retrieving, and excerpts building. Please note
how does it perform database select using ActiveRecord to
save the order of records established by Sphinx.
sphinx = Sphinx::Client.new
result = sphinx.query('test')
ids = result['matches'].map { |match| match['id'] }
posts = Post.all :conditions => { :id => ids },
:order => "FIELD(id,#{ids.join(',')})"
docs = posts.map(&:body)
excerpts = sphinx.build_excerpts(docs, 'index', 'test')
== Support
Source code:
http://github.com/kpumuk/sphinx
To suggest a feature or report a bug:
http://github.com/kpumuk/sphinx/issues
Project home page:
http://kpumuk.info/projects/ror-plugins/sphinx
== Credits
Dmytro Shteflyuk http://kpumuk.info
Andrew Aksyonoff http://sphinxsearch.com
Special thanks to Alexey Kovyrin http://blog.kovyrin.net
Special thanks to Mike Perham http://www.mikeperham.com for his awesome
memcache-client gem, where latest Sphinx gem got new sockets handling from.
==License
This library is distributed under the terms of the Ruby license.
You can freely distribute/modify this library.