README.md in postgresql_cursor-0.6.1 vs README.md in postgresql_cursor-0.6.2

- old
+ new

@@ -1,6 +1,6 @@ -#PostgreSQLCursor for handling large Result Sets +# PostgreSQLCursor for handling large Result Sets [![Gem Version](https://badge.fury.io/rb/postgresql_cursor.svg)](http://badge.fury.io/rb/postgresql_cursor) PostgreSQLCursor extends ActiveRecord to allow for efficient processing of queries returning a large number of rows, and allows you to sort your result set. @@ -11,20 +11,16 @@ (say 1000) rows, process them, and continue fetching until the result set is exhausted. By fetching a smaller chunk of data, this reduces the amount of memory your application uses and prevents the potential crash of running out of memory. -This extension is not intended to support the "FOR UPDATE / WHERE -CURRENT OF" syntax to process and update each row in place. The primary -goal is to read a large number of rows using buffering. - Supports Rails/ActiveRecord v3.1 (v3.2 recommended) higher (including v5.0) and Ruby 1.9 and higher. Not all features work in ActiveRecord v3.1. Support for this gem will only be for officially supported versions of ActiveRecord and Ruby; others can try older versions of the gem. -##Using Cursors +## Using Cursors PostgreSQLCursor was developed to take advantage of PostgreSQL's cursors. Cursors allow the program to declare a cursor to run a given query returning "chunks" of rows to the application program while retaining the position of the full result set in the database. This overcomes all the disadvantages of using find_each and find_in_batches. @@ -77,11 +73,11 @@ * Use cursors *only* for large result sets. They have more overhead with the database than ActiveRecord selecting all matching records. * Aliases each_hash and each_hash_by_sql are provided for each_row and each_row_by_sql if you prefer to express what types are being returned. -###PostgreSQLCursor is an Enumerable +### PostgreSQLCursor is an Enumerable If you do not pass in a block, the cursor is returned, which mixes in the Enumerable libary. With that, you can pass it around, or chain in the awesome enumerable things like `map` and `reduce`. Furthermore, the cursors already act as `lazy`, but you can also chain in `lazy` when you want to keep the memory footprint small for rest of the process. @@ -89,11 +85,11 @@ ```ruby Product.each_row.map {|r| r["id"].to_i } #=> [1, 2, 3, ...] Product.each_instance.map {|r| r.id }.each {|id| p id } #=> [1, 2, 3, ...] Product.each_instance.lazy.inject(0) {|sum,r| sum + r.quantity } #=> 499500 ``` -###Hashes vs. Instances +### Hashes vs. Instances The each_row method returns the Hash of strings for speed (as this allows you to process a lot of rows). Hashes are returned with String values, and you must take care of any type conversion. When you use each_instance, ActiveRecord lazily casts these strings into @@ -101,11 +97,11 @@ If you find you need the types cast for your attributes, consider using each_instance insead. ActiveRecord's read casting algorithm will only cast the values you need and has become more efficient over time. -###Select and Pluck +### Select and Pluck To limit the columns returned to just those you need, use `.select(:id, :name)` query method. ```ruby @@ -126,22 +122,48 @@ Product.select(:id).each_row.map {|r| r["id"].to_i } # cursor instead of pluck Product.pluck_rows(:id) #=> ["1", "2", ...] Product.pluck_instances(:id, :quantity) #=> [[1, 503], [2, 932], ...] ``` -###Associations and Eager Loading +### Associations and Eager Loading ActiveRecord performs some magic when eager-loading associated row. It will usually not join the tables, and prefers to load the data in separate queries. This library hooks onto the `to_sql` feature of the query builder. As a result, it can't do the join if ActiveRecord decided not to join, nor can it construct the association objects eagerly. -##Background: Why PostgreSQL Cursors? +## Locking and Updating Each Row (FOR UPDATE Queries) +When you use the AREL `lock` method, a "FOR UPDATE" clause is added to +the query. This causes the block of rows returned from each FETCH +operation (see the `block_size` option) to be locked for you to update. +The lock is released on those rows once the block is exhausted and the +next FETCH or CLOSE statement is executed. + +This example will run through a large table and potentially update each +row, locking only a set of rows at a time to allow concurrent use. + +```ruby +Product.lock.each_instance(block_size:100) do |p| + p.update(price: p.price * 1.05) +end +``` + +Also, pay attention to the `block_size` you request. Locking large +blocks of rows for an extended time can cause deadlocks or other +performance issues in your application. On a busy table, or if the +processing of each row consumes a lot of time or resources, try a +`block_size` <= 10. + +See the [PostgreSQL Select Documentation](https://www.postgresql.org/docs/current/static/sql-select.html) +for more information and limitations when using "FOR UPDATE" locking. + +## Background: Why PostgreSQL Cursors? + ActiveRecord is designed and optimized for web performance. In a web transaction, only a "page" of around 20 rows is returned to the user. When you do this ```ruby Product.where("id>0").each { |product| product.process } @@ -153,11 +175,11 @@ Asynchronous, Background, or Offline processing may require processing a large amount of data. When there is a very large number of rows, this requires a lot more memory to hold the data. Ruby does not return that memory after processing the array, and the causes your process to "bloat". If you don't have enough memory, it will cause an exception. -###ActiveRecord.find_each and find_in_batches +### ActiveRecord.find_each and find_in_batches To solve this problem, ActiveRecord gives us two alternative methods that work in "chunks" of your data: ```ruby Product.where("id>0").find_each { |model| Product.process } @@ -177,43 +199,37 @@ * You cannot use overly complex queries as that will be rerun and incur more overhead. ### How it works Under the covers, the library calls the PostgreSQL cursor operations -with the psuedo-code: +with the pseudo-code: SET cursor_tuple_fraction TO 1.0; DECLARE cursor_1 CURSOR WITH HOLD FOR select * from widgets; loop rows = FETCH 100 FROM cursor_1; rows.each {|row| yield row} until rows.size < 100; CLOSE cursor_1; -##Meta -###Author -Allen Fair, [@allenfair](https://twitter.com/allenfair), http://github.com/afair +## Meta +### Author +Allen Fair, [@allenfair](https://twitter.com/allenfair), [github://afair](https://github.com/afair) -Thanks to: +### Note on Patches/Pull Requests -* Iulian Dogariu, http://github.com/iulianu (Fixes) -* Julian Mehnle, julian@mehnle.net (Suggestions) -* ...And all the other contributers! - -###Note on Patches/Pull Requests - * Fork the project. * Make your feature addition or bug fix. * Add tests for it. This is important so I don't break it in a future version unintentionally. * Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull) * Send me a pull request. Bonus points for topic branches. -###Code of Conduct +### Code of Conduct This project adheres to the [Open Code of Conduct](http://todogroup.org/opencodeofconduct/#postgresql_cursor/2016@allenfair.com). By participating, you are expected to honor this code. -###Copyright +### Copyright -Copyright (c) 2010-2014 Allen Fair. See (MIT) LICENSE for details. +Copyright (c) 2010-2017 Allen Fair. See (MIT) LICENSE for details.