@ wrote... (1 week, 3 days ago)

We recently upgraded our network to 10 Gbit and were really hoping to see monumental speed increases in our ceph cluster.

One of our benchmarks was pgbench and to say we were sad would be an understatement…

I created a database and ran tests like so…

createdb -U postgres bench
pgbench -U postgres -i -d bench
pgbench -U postgres -d bench -c 70 2> /dev/null

And here's a typical run. Latency varied a lot but never less than 300ms.

latency average = 951.736 ms
tps = 73.549842 (including connections establishing)

Yeah, so a four node ceph cluster with 12 OSD was getting 73 TPS with a second of latency. Did somebody swap out my drives with a floppy disk?!?

This was so horrifically bad it put in jeaporady an entire 5 year forecast of our tech stack…

Now I don't claim to be a ceph or postgres expert but here's what I tried.

# fsync = off
latency average = 50.641 ms
tps = 1382.270643 (including connections establishing)

So that's groovy and all but it's totally unsafe and not appropriate for production. The good news was that this proved that the ceph cluster itself wasn't utterly broken.

So then I noticed the next line in postgresql.conf

# synchronous_commit = on # synchronization level;

So I tried turning that off and…

latency average = 48.036 ms
tps = 1457.230219 (including connections establishing)

wut! wut!

Basically equivalent speeds to fsync = off.

pretty strange when there is no replication…

I also tried with synchronous_commit = local and got

latency average = 225.714 ms
tps = 310.127180 (including connections establishing)

Which is great compared to the original results but still dismal.

For our use case maybe losing a couple of transactions is worth 10x speed improvement.

The really strange bit is that we aren't running replication on our test machine so synchronous_commit shouldn't affect anything as per my understanding…

But as I said, I'm not a postgresql expert. But all this really begs the question, wtf is ceph doing that causes fsyncs to be so slow? Too bad I'm not a ceph expert either…

Here's a good page explaining syncronous_commit, https://www.tutorialdba.com/2018/04/how-to-improve-performance-of.html

tl;dr - change synchronous_commit = off

Category: tech, Tags: ceph, postgresql
Comments: 0
Click here to add a comment