README.md in salesforce_chunker-1.2.0 vs README.md in salesforce_chunker-1.2.1
- old
+ new
@@ -53,18 +53,22 @@
| username | required |
| password | required |
| security_token | may be required depending on your Salesforce setup |
| domain | optional. defaults to `"login"`. |
| salesforce_version | optional. defaults to `"42.0"`. Must be >= `"33.0"` to use PK Chunking. |
+| logger | optional. logger to use. Must be instance of or similar to rails logger. Use here if you want to log all API page requests. |
+| log_output | optional. log output to use. i.e. `STDOUT`. |
+
#### Functions
| function | |
| --- | --- |
| query |
| single_batch_query | calls `query(job_type: "single_batch", **options)` |
| primary_key_chunking_query | calls `query(job_type: "primary_key_chunking", **options)` |
+| manual_chunking_query | calls `query(job_type: "manual_chunking", **options)` |
#### Query
```ruby
options = {
@@ -86,22 +90,50 @@
| Parameter | | |
| --- | --- | --- |
| query | required | SOQL query. |
| object | required | Salesforce Object type. |
-| batch_size | optional | defaults to `100000`. Number of records to process in a batch. (Only for PK Chunking) |
+| batch_size | optional | defaults to `100000`. Number of records to process in a batch. (Not used in Single Batch jobs) |
| retry_seconds | optional | defaults to `10`. Number of seconds to wait before querying API for updated results. |
-| timeout_seconds | optional | defaults to `3600`. Number of seconds to wait before query is killed. |
+| timeout_seconds | optional | defaults to `3600`. Number of seconds to wait for a batch to process before job is killed. |
| logger | optional | logger to use. Must be instance of or similar to rails logger. |
| log_output | optional | log output to use. i.e. `STDOUT`. |
-| job_type | optional | defaults to `"primary_key_chunking"`. Can also be set to `"single_batch"`. |
+| job_type | optional | defaults to `"primary_key_chunking"`. Can also be set to `"single_batch"` or `"manual_chunking`. |
| include_deleted | optional | defaults to `false`. Whether to include deleted records. |
`query` can either be called with a block, or will return an enumerator:
```ruby
names = client.query(query, object, options).map { |result| result["Name"] }
```
+
+### A discussion about Single Batch, Primary Key Chunking, and Manual Chunking job types.
+
+One of the advantages of the Salesforce Bulk API over the other Salesforce APIs is the ability for Salesforce to process a number of requests (either queries or uploads) in parallel on their servers. The request chunks are referred to as batches.
+
+#### Single Batch Query
+
+In a single batch query, one SOQL statement is executed as a single batch. This works best if the total number of records to return is fewer than around 100,000 depending on memory usage and the number of fields being returned.
+
+#### Primary Key Chunking Query
+
+In Primary Key Chunking, the internal Salesforce PK chunking flag is used. Salesforce will create a number of batches automatically based on an internal Id index. See https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm
+
+#### Manual Chunking Query
+
+This approach is called "Manual" Chunking because it is our own implementation of PK Chunking in this gem. The gem downloads a CSV ordered list of all Ids it needs to download, and then uses this list to generate breakpoints that it uses to create batches.
+
+#### Primary Key Chunking Query vs Manual Chunking Query
+
+Advantages of Manual Chunking:
+
+- Manual Chunking takes into account the where clause in the SOQL statement. For example, if you are filtering a small number of a large object count, say 250k out of 20M Objects, then Manual Chunking will split this into 3 batches of max 100k while PK chunking will split this into 200 batches, which will use up batches and API requests against your account and take a longer amount of time.
+- Any object can use Manual Chunking (according to Salesforce, PK chunking is supported for the following objects: Account, Asset, Campaign, CampaignMember, Case, CaseHistory, Contact, Event, EventRelation, Lead, LoginHistory, Opportunity, Task, User, and custom objects.)
+
+Advantages of Primary Key Chunking:
+
+- Primary Key Chunking appears to be slightly faster, if using a PK Chunking eligible object and no where clause.
+- Primary Key Chunking may be less buggy because many more people depend on the Salesforce API than this gem.
### Under the hood: SalesforceChunker::Job
Using `SalesforceChunker::Job`, you have more direct access to the Salesforce Bulk API functions, such as `create_batch`, `get_batch_statuses`, and `retrieve_batch_results`. This can be used to perform custom tasks, such as upserts or multiple batch queries.