Re: [HACKERS] Increase Vacuum ring buffer. - Mailing list pgsql-hackers
From | Sokolov Yura |
---|---|
Subject | Re: [HACKERS] Increase Vacuum ring buffer. |
Date | |
Msg-id | [email protected] Whole thread Raw |
In response to | Re: [HACKERS] Increase Vacuum ring buffer. (Sokolov Yura <[email protected]>) |
Responses |
Re: [HACKERS] Increase Vacuum ring buffer.
|
List | pgsql-hackers |
On 2017-07-24 12:41, Sokolov Yura wrote: > On 2017-07-21 20:41, Sokolov Yura wrote: >> On 2017-07-21 19:32, Robert Haas wrote: >>> On Fri, Jul 21, 2017 at 4:19 AM, Sokolov Yura >>> <[email protected]> wrote: >>>> >>>> Probably with increased ring buffer there is no need in raising >>>> vacuum_cost_limit. Will you admit it? >>> >>> No, I definitely won't admit that. With default settings autovacuum >>> won't write more than ~2.3MB/s if I remember the math correctly, so >>> if >>> you've got a 1TB table you're probably going to need a bigger value. >>> >>> -- >>> Robert Haas >>> EnterpriseDB: http://www.enterprisedb.com >>> The Enterprise PostgreSQL Company >> >> I've seed autovacuum process spending >50% of its time in fsync >> (with current ring buffer) (but I used autovacuum_cost_delay=2ms). >> fsync could lasts up to second on hdd if there is concurrent IO. >> Even on ssd fsync could be really noticeable. >> >> But, I agree that for 1TB table autovacuum_cost_limit still should >> be increased, even with larger ring buffer. >> >> >> My friend noticed, that I didn't said why I bother with autovacuum. >> Our customers suffers from table bloating. I've made synthetic >> bloating test, and started experiments with modifying micro- and >> auto-vacuum. My first attempts were to update FSM early (both in >> micro and autovacuum) and update it upto root, not only low level. >> >> Then I looked to strace of autovacuum process, and noticed storm >> of fsync. I catched backtraces with gdb rooting on fsync, and >> found that evicting dirty pages from small ring buffer it the >> reason. >> >> After some experiments with combining my "early fsm update" and >> size of ring buffer, I understood that increasing ring buffer >> gives most of benefits: autovacuum runs faster, and bloating is >> greatly reduced. On extreme case, 400mb table bloats to 17GB >> on master, and only to 5GB with faster autovacuum. >> >> I used custom scripts, and that is why my statistic is not full. >> Though, I didn't found performance reduction. In fact, it looks >> like tests with "larger autovacuum ring" did more queries per hour >> than tests against master. >> >> I will run pgbench for weekend, so latencies and percentiles >> will be collected. >> >> With regards, >> -- >> Sokolov Yura aka funny_falcon >> Postgres Professional: https://postgrespro.ru >> The Russian Postgres Company > > Default pgbench script wasn't able to trigger autovacuum of > pgbench_accounts table in 8 hours (scale 400, 40 clients, 900tps > average), so weekend testing were not useful. > > I will re-run with custom script for next day-two. > > -- > Sokolov Yura aka funny_falcon > Postgres Professional: https://postgrespro.ru > The Russian Postgres Company I've maid 3*8hour runs with master and 16MB ring. scale 400, 40 clients, query script: \set aid1 random(1, 100000 * :scale) \set aidd random(1, 3) \set aid2 :aid1 + :aidd \set aid3 :aid1 + 2 * :aidd \set aid4 :aid1 + 3 * :aidd \set aid5 :aid1 + 4 * :aidd \set delta random(-5000, 5000) update pgbench_accounts set abalance = abalance + :delta where aid in (:aid1, :aid2, :aid3, :aid4, :aid5); postgresql.conf: max_connections = 300 shared_buffers = 2GB work_mem = 128MB maintenance_work_mem = 512MB bgwriter_lru_maxpages = 10 bgwriter_flush_after = 2MB backend_flush_after = 2MB wal_compression = on wal_buffers = 32MB checkpoint_flush_after = 2MB autovacuum = on log_autovacuum_min_duration = 0 autovacuum_vacuum_scale_factor = 0.05 autovacuum_vacuum_cost_delay = 2ms (I had to slow down bgwriter (bgwriter_lru_maxpages = 10), cause otherwise all were too slow. May be I did it wrong) (I open for all suggestion about postgresql.conf) I've tried to make pretty log in testing5_pretty.tar.gz . Files 'testing5_sum/test_master{,_ring16}_[123]/pretty.log contains combined prettified logs from postgresql and pgbench. Some excerpts: test_master_1/pretty.log time activity tps latency stddev min max 10980 av 1364 29ms 76ms 6ms 1170ms 11010 av 430 90ms 253ms 7ms 2472ms 11040 245 170ms 485ms 7ms 2821ms 11070 487 81ms 238ms 6ms 2404ms 11100 av 360 112ms 261ms 7ms 2549ms 11130 av+ch 198 198ms 374ms 7ms 1956ms 11160 av+ch 248 163ms 401ms 7ms 2601ms 11190 av+ch 321 125ms 363ms 7ms 2722ms 11220 av+ch 1155 35ms 123ms 7ms 2668ms 11250 av+ch 1390 29ms 79ms 7ms 1422ms 11280 av 624 64ms 176ms 6ms 1922ms 11310 av 454 87ms 236ms 7ms 2481ms 11340 av 524 77ms 223ms 6ms 2383ms 11370 av 414 96ms 267ms 7ms 2853ms 11400 av 380 103ms 235ms 7ms 2298ms 11430 av+ch 239 168ms 344ms 7ms 2482ms 11460 av+ch 138 293ms 516ms 7ms 2438ms 11490 av+ch 231 175ms 494ms 7ms 3150ms 11520 av+ch 1133 35ms 112ms 7ms 2378ms 11550 av+ch 1391 29ms 64ms 6ms 957ms test_master_ring16_1/pretty.log time activity tps latency stddev min max 10710 498 82ms 249ms 7ms 2801ms 10740 408 99ms 271ms 7ms 2793ms 10770 399 99ms 284ms 7ms 3233ms 10800 279 142ms 347ms 7ms 2641ms 10830 ch 245 164ms 436ms 7ms 2618ms 10860 ch 462 86ms 246ms 7ms 2497ms 10890 ch 78 506ms 905ms 6ms 3198ms 10920 ch 17 2407ms 217ms 1650ms 2678ms 10950 ch 652 64ms 272ms 7ms 2471ms 10980 av 976 41ms 126ms 6ms 2219ms 11010 av 379 104ms 257ms 7ms 2491ms 11040 av 381 107ms 274ms 7ms 2426ms 11070 av 325 123ms 294ms 6ms 2497ms 11100 av 226 173ms 387ms 7ms 1993ms 11130 av+ch 26 1575ms 635ms 101ms 2536ms 11160 av+ch 25 1552ms 648ms 58ms 2376ms 11190 av+ch 32 1275ms 726ms 16ms 2493ms 11220 av+ch 23 1584ms 674ms 48ms 2454ms 11250 av+ch 35 1235ms 777ms 22ms 3627ms 11280 av+ch 1301 30ms 145ms 6ms 2778ms 11310 av 903 46ms 125ms 7ms 2406ms 11340 av 395 100ms 291ms 7ms 2849ms 11370 av 377 103ms 255ms 7ms 2082ms 11400 av 340 114ms 309ms 7ms 3160ms Archive testing5_all.tar.gz contains more raw logs. My interpretation: - autovacuum runs 3-4 times faster (2700sec unpatched vs 800sec patched) - faster autovacuum alone is not big problem. While concurrent transactions are slower a bit, but not catastrophically slower, - faster autovacuum with checkpoint running simultaneously is a big problem. May be checkpoint process should affect autovacuum_cost ? With regards, -- Sokolov Yura aka funny_falcon Postgres Professional: https://postgrespro.ru The Russian Postgres Company -- Sent via pgsql-hackers mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: