@ wrote... (11 months, 3 weeks ago)

We recently upgraded our network to 10 Gbit and were really hoping to see monumental speed increases in our ceph cluster.

One of our benchmarks was pgbench and to say we were sad would be an understatement…

I created a database and ran tests like so…

createdb -U postgres bench
pgbench -U postgres -i -d bench
pgbench -U postgres -d bench -c 70 2> /dev/null

And here's a typical run. Latency varied a lot but never less than 300ms.

latency average = 951.736 ms
tps = 73.549842 (including connections establishing)

Yeah, so a four node ceph cluster with 12 OSD was getting 73 TPS with a second of latency. Did somebody swap out my drives with a floppy disk?!?

This was so horrifically bad it put in jeaporady an entire 5 year forecast of our tech stack…

Now I don't claim to be a ceph or postgres expert but here's what I tried.

# fsync = off
latency average = 50.641 ms
tps = 1382.270643 (including connections establishing)

So that's groovy and all but it's totally unsafe and not appropriate for production. The good news was that this proved that the ceph cluster itself wasn't utterly broken.

So then I noticed the next line in postgresql.conf

# synchronous_commit = on # synchronization level;

So I tried turning that off and…

latency average = 48.036 ms
tps = 1457.230219 (including connections establishing)

wut! wut!

Basically equivalent speeds to fsync = off.

pretty strange when there is no replication…

I also tried with synchronous_commit = local and got

latency average = 225.714 ms
tps = 310.127180 (including connections establishing)

Which is great compared to the original results but still dismal.

For our use case maybe losing a couple of transactions is worth 10x speed improvement.

The really strange bit is that we aren't running replication on our test machine so synchronous_commit shouldn't affect anything as per my understanding…

But as I said, I'm not a postgresql expert. But all this really begs the question, wtf is ceph doing that causes fsyncs to be so slow? Too bad I'm not a ceph expert either…

Here's a good page explaining syncronous_commit, https://www.tutorialdba.com/2018/04/how-to-improve-performance-of.html

tl;dr - change synchronous_commit = off

Category: tech, Tags: ceph, postgresql
Comments: 2
Kurt @ March 9, 2020 wrote... (3 weeks, 2 days ago)

We ended up putting our main postgres databases on dedicated nvme drives in a zfs pool.

I really want to like Ceph but it is very complicated so make sure you actually need it before trying to deploy it.

As for requirements, dedicated 10 gbit for ceph replication and another 10 gbit nic for the public network.

More drives and the more nodes the better. If I were to do it all again I'd insist on SSDs for pool, spinners are too slow when there are 50+ virtual machines all trying to write their logs and whatnot.

For deploying, I use Ansible for everything.

Feng Pan @ March 9, 2020 wrote... (3 weeks, 2 days ago)

Hi Kurt, nice article! We are evaluating ceph as our product's underlying file system. Does it work well with postgres? What's the minimum hardware requirement (10Gbit/s network)? What do you use to deploy ceph and postgres?

Thank you in advance for your help!

Click here to add a comment