README.md in kraps-0.3.0 vs README.md in kraps-0.4.0

- old
+ new

@@ -93,31 +93,44 @@ ```ruby class MyKrapsWorker include Sidekiq::Worker def perform(json) - Kraps::Worker.new(json, memory_limit: 128.megabytes, chunk_limit: 64, concurrency: 8).call(retries: 3) + Kraps::Worker.new(json, memory_limit: 16.megabytes, chunk_limit: 64, concurrency: 8).call(retries: 3) end end ``` The `json` argument is automatically enqueued by Kraps and contains everything it needs to know about the job and step to execute. The `memory_limit` tells -Kraps how much memory it is allowed to allocate for temporary chunks, etc. This -value depends on the memory size of your container/server and how much worker -threads your background queue spawns. Let's say your container/server has 2 -gigabytes of memory and your background framework spawns 5 threads. -Theoretically, you might be able to give 300-400 megabytes to Kraps then. The -`chunk_limit` ensures that only the specified amount of chunks are processed in -a single run. A run basically means: it takes up to `chunk_limit` chunks, -reduces them and pushes the result as a new chunk to the list of chunks to -process. Thus, if your number of file descriptors is unlimited, you want to set -it to a higher number to avoid the overhead of multiple runs. `concurrency` -tells Kraps how much threads to use to concurrently upload/download files from -the storage layer. Finally, `retries` specifies how often Kraps should retry -the job step in case of errors. Kraps will sleep for 5 seconds between those -retries. Please note that it's not yet possible to use the retry mechanism of -your background job framework with Kraps. +Kraps how much memory it is allowed to allocate for temporary chunks. More +concretely, it tells Kraps how big the file size of a temporary chunk can grow +in memory up until Kraps must write it to disk. However, ruby of course +allocates much more memory for a chunk than the raw file size of the chunk. As +a rule of thumb, it allocates 10 times more memory. Still, choosing a value for +`memory_size` depends on the memory size of your container/server, how much +worker threads your background queue spawns and how much memory your workers +need besides of Kraps. Let's say your container/server has 2 gigabytes of +memory and your background framework spawns 5 threads. Theoretically, you might +be able to give 300-400 megabytes to Kraps then, but now divide this by 10 and +specify a `memory_limit` of around `30.megabytes`, better less. The +`memory_limit` affects how much chunks will be written to disk depending on the +data size you are processing and how big these chunks are. The smaller the +value, the more chunks and the more chunks, the more runs Kraps need to merge +the chunks. It can affect the performance The `chunk_limit` ensures that only +the specified amount of chunks are processed in a single run. A run basically +means: it takes up to `chunk_limit` chunks, reduces them and pushes the result +as a new chunk to the list of chunks to process. Thus, if your number of file +descriptors is unlimited, you want to set it to a higher number to avoid the +overhead of multiple runs. `concurrency` tells Kraps how much threads to use to +concurrently upload/download files from the storage layer. Finally, `retries` +specifies how often Kraps should retry the job step in case of errors. Kraps +will sleep for 5 seconds between those retries. Please note that it's not yet +possible to use the retry mechanism of your background job framework with +Kraps. Please note, however, that `parallelize` is not covered by `retries` +yet, as the block passed to `parallelize` is executed by the runner, not the +workers. + Now, executing your job is super easy: ```ruby Kraps::Runner.new(SearchLogCounter).call(start_date: '2018-01-01', end_date: '2022-01-01')