# Testing - benchmark/local files These files generate data that shows request-per-second (RPS), etc. Typically, files are in pairs, a shell script and a Ruby script. The shell script starts the server, then runs the Ruby file, which starts client request stream(s), then collects and logs metrics. ## response_time_wrk.sh This uses [wrk] for generating data. One or more wrk runs are performed. Summarizes RPS and wrk latency times. The default for the `-b` argument runs 28 different client request streams, and takes a bit over 5 minutes. See 'Request Stream Configuration' below for `-b` argument description.
Summary output for
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s tcp6:
``` Type req/sec 50% 75% 90% 99% 100% Resp Size ───────────────────────────────────────────────────────────────── 1kB array 13710 0.74 2.52 5.23 7.76 37.45 1024 chunk 13502 0.76 2.55 5.28 7.84 11.23 1042 string 13794 0.74 2.51 5.20 7.75 14.07 1024 io 9615 1.16 3.45 7.13 10.57 15.75 1024 ───────────────────────────────────────────────────────────────── 10kB array 13458 0.76 2.57 5.31 7.93 13.94 10239 chunk 13066 0.78 2.64 5.46 8.18 38.48 10320 string 13500 0.76 2.55 5.29 7.88 11.42 10240 io 9293 1.18 3.59 7.39 10.94 16.99 10240 ───────────────────────────────────────────────────────────────── 100kB array 11315 0.96 3.06 6.33 9.49 17.69 102424 chunk 9916 1.10 3.48 7.20 10.73 15.14 103075 string 10948 1.00 3.17 6.57 9.83 17.88 102378 io 8901 1.21 3.72 7.48 11.27 59.98 102407 ───────────────────────────────────────────────────────────────── 256kB array 9217 1.15 3.82 7.88 11.74 17.12 262212 chunk 7339 1.45 4.76 9.81 14.63 22.70 264007 string 8574 1.19 3.81 7.73 11.21 15.80 262147 io 8911 1.19 3.80 7.55 15.25 60.01 262183 ───────────────────────────────────────────────────────────────── 512kB array 6951 1.49 5.03 10.28 15.90 25.08 524378 chunk 5234 2.03 6.56 13.57 20.46 32.15 527862 string 6438 1.55 5.04 10.12 16.28 72.87 524275 io 8533 1.15 4.62 8.79 48.15 70.51 524327 ───────────────────────────────────────────────────────────────── 1024kB array 4122 1.80 15.59 41.87 67.79 121.00 1048565 chunk 3158 2.82 15.22 31.00 71.39 99.90 1055654 string 4710 2.24 6.66 13.65 20.38 70.44 1048575 io 8355 1.23 3.95 7.94 14.08 68.54 1048498 ───────────────────────────────────────────────────────────────── 2048kB array 2454 4.12 14.02 27.70 43.48 88.89 2097415 chunk 1743 6.26 17.65 36.98 55.78 92.10 2111358 string 2479 4.38 12.52 25.65 38.44 95.62 2097502 io 8264 1.25 3.83 7.76 11.73 65.69 2097090 Body ────────── req/sec ────────── ─────── req 50% times ─────── KB array chunk string io array chunk string io 1 13710 13502 13794 9615 0.745 0.757 0.741 1.160 10 13458 13066 13500 9293 0.760 0.784 0.759 1.180 100 11315 9916 10948 8901 0.960 1.100 1.000 1.210 256 9217 7339 8574 8911 1.150 1.450 1.190 1.190 512 6951 5234 6438 8533 1.490 2.030 1.550 1.150 1024 4122 3158 4710 8355 1.800 2.820 2.240 1.230 2048 2454 1743 2479 8264 4.120 6.260 4.380 1.250 ───────────────────────────────────────────────────────────────────── wrk -t8 -c16 -d10s benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s tcp6 -Y Server cluster mode -w2 -t5:5, bind: tcp6 Puma repo branch 00-response-refactor ruby 3.2.0dev (2022-06-14T01:21:55Z master 048f14221c) +YJIT [x86_64-linux] [2136] - Gracefully shutting down workers... [2136] === puma shutdown: 2022-06-13 21:16:13 -0500 === [2136] - Goodbye! 5:15 Total Time ```

## bench_base.sh, bench_base.rb These two files setup parameters for the Puma server, which is normally started in a shell script. It then starts a Ruby file (a subclass of BenchBase), passing arguments to it. The Ruby file is normally used to generate a client request stream(s). ### Puma Configuration The following arguments are used for the Puma server: * **`-C`** - configuration file * **`-d`** - app delay * **`-r`** - rackup file, often defaults to test/rackup/ci_select.ru * **`-s`** - bind socket type, default is tcp/tcp4, also tcp6, ssl/ssl4, ssl6, unix, or aunix (unix & abstract unix are not available with wrk). * **`-t`** - threads, expressed as '5:5', same as Puma --thread * **`-w`** - workers, same as Puma --worker * **`-Y`** - enable Ruby YJIT ### Request Stream Configuration The following arguments are used for request streams: * **`-b`** - response body configuration. Body type options are a array, c chunked, s string, and i for File/IO. None or any combination can be specified, they should start the option. Then, any combination of comma separated integers can be used for the response body size in kB. The string 'ac50,100' would create four runs, 50kb array, 50kB chunked, 100kB array, and 100kB chunked. See 'Testing - test/rackup/ci-*.ru files' for more info. * **`-c`** - connections per client request stream thread, defaults to 2 for wrk. * **`-D`** - duration of client request stream in seconds. * **`-T`** - number of threads in the client request stream. For wrk, this defaults to 80% of Puma workers * max_threads. ### Notes - Configuration The above lists script arguments. `bench_base.sh` contains most server defaults. Many can be set via ENV variables. `bench_base.rb` contains the client request stream defaults. The default value for `-b` is `acsi1,10,100,256,512,1024,2048`, which is a 4 x 7 matrix, and hence, runs 28 jobs. Also, the i body type (File/IO) generates files, they are placed in the `"#{Dir.tmpdir}/.puma_response_body_io"` directory, which is created. ### Notes - wrk The shell scripts use `-T` for wrk's thread count, since `-t` is used for Puma server threads. Regarding the `-c` argument, wrk has an interesting behavior. The total number of connections is set by `(connections/threads).to_i`. The scripts here use `-c` as connections per thread. Hence, using `-T4 -c2` will yield a total of eight wrk connections, two per thread. The equivalent wrk arguments would be `-t4 -c8`. Puma can only process so many requests, and requests will queue in the backlog until Puma can respond to them. With wrk, if the number of total connections is too high, one will see the upper latency times increase, pushing into the lower latency times as the connections are increased. The default values for wrk's threads and connections were chosen to minimize requests' time in the backlog. An example with four wrk runs using `-b s10`. Notice that `req/sec` varies by less than 1%, but the `75%` times increase by an order of magnitude: ``` req/sec 50% 75% 90% 99% 100% Resp Size wrk cmd line ───────────────────────────────────────────────────────────────────────────── 13597 0.755 2.550 5.260 7.800 13.310 12040 wrk -t8 -c16 -d10 13549 0.793 4.430 8.140 11.220 16.600 12002 wrk -t10 -c20 -d10 13570 1.040 25.790 40.010 49.070 58.300 11982 wrk -t8 -c64 -d10 13684 1.050 25.820 40.080 49.160 66.190 12033 wrk -t16 -c64 -d10 ``` Finally, wrk's output may cause rounding errors, so the response body size calculation is imprecise. [wrk]: