= PostgreSQLCursor
PostgreSQL Cursor is an extension to the ActiveRecord PostgreSQLAdapter for very large result sets.
It provides a cursor open/fetch/close interface to access data without loading all rows into memory,
and instead loads the result rows in "chunks" (default of 10_000 rows), buffers them, and returns the
rows one at a time.
For web pages, an application would not want to process a large amount of data, usually employing a
Pagination scheme to present it to the users. Background processes sometimes need to generate a large
amount of data, and the ActiveRecord approach to load all data into memory is not the best fit here.
Previous solutions employ pagination to fetch each block, then re-running the query for the next "page".
This gem avoids re-executing the query by using PostgreSQL's cursor feature.
The ActiveRecord methods for the cursor return instances of the model class by default, however you
can pass in a block to override this strategy. For instance, returning Hashes instead of AR instances
is faster. Julian's benchmarks showed returning instances was a factor of 4 slower than returning a hash.
NOTE: This gem is intended to replace the 'postgresql-cursor' (with hyphen, not underscore) library.
I also plan on supporing ActiveRecord 3.x in the near future.
==Installation
[sudo] gem install postgresql_cursor
This does not require Rails to work, just ActiveRecord < 3.0.0 and the 'pg' gem. Rails 3 support is forthcoming.
You can then configure your Rails application by adding to the config/environment.rb file
config.gem 'postgresql_cursor'
or require the gem on your non-rails application
require 'rubygems'
require 'postgresql_cursor'
==Usage
This library is intended to be used via ActiveRecord. It provides two methods on ActiveRecord::Base
like the find() method, that will return cursor objects instead of an array of model instances.
Calling each() the returned cursor will yield a record to the block for each row. It handles a transaction
block, open, buffering, and closing the cursor. In this way, it operates like an Array object. Note that
it does not implement Enumerable, so no other methods are available.
* *find_with_cursor*( find options, :cursor=>{cursor options}, &block) returns a cursor for the data matching the find options. It takes an optional block accepting a column=>value hash which returns the object desired via each or next, instead of an instance of the model.
* *find_by_sql_with_cursor*(select statement, cursor options, &block) takes a custom SQL statement and returns each row as above.
==Examples
Account.find_with_cursor(:conditions=>["status = ?", 'good']).each do |row|
puts row.to_json
end
Account.find_by_sql_with_cursor("select ...", :buffer_size=>1000).each do |row|
row.process
end
Account.transaction do
cursor = Account.find_with_cursor(...) { |record| record.symbolize_keys }
while record = cursor.next do
record.process # => {:column=>value, ...}
cursor.close if cursor.count > 1000 # Halts loop after 1000 records
end
end
Account.find_with_cursor(...) { |record| record.symbolize_keys }.each do |row|
row.process
end
==Authors
Allen Fair, allen.fair@gmail.com, http://github.com/afair
Thank you to:
* Iulian Dogariu, http://github.com/iulianu (Fixes)
* Julian Mehnle, http://www.mehnle.net (Suggestions)
== Note on Patches/Pull Requests
* Fork the project.
* Make your feature addition or bug fix.
* Add tests for it. This is important so I don't break it in a
future version unintentionally.
* Commit, do not mess with rakefile, version, or history.
(if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
* Send me a pull request. Bonus points for topic branches.
== Copyright
Copyright (c) 2010 Allen Fair. See LICENSE for details.