Our Private Cloud
A tale of two DC
Polyglot YEG 2019
by Kurt Neufeld
- Programmed a bunch in C++
- Then programmed a bunch in Python
- Been running Linux at home and work for 20+ years
- Been with Artsman for about a 1½ years in a DevOps/Ops/SysAdmin role
- Expert copy-paster from StackOverflow, I guess
you could say I'm full stack
- Our Architecture
- The Pieces
- Tears and Sadness
- Sells ticketing software
- Desktop app (Omnis) for native OSX & Windows
- Postgres holds state
- Webapp for ticket sales
- Offer a hosted (SaaS) solution
||Theatre Manager, etc
||Linux, no other viable options
||Physical, Amazon, Google, VMWare
||LXC moving towards Docker
||Nomad; Kubernetes (not Rancher)
|LXC or VM
||apps in LXC, Docker in VMs
||Proxmox; VMWare, Amazon
||2 datacenters, east and west
Firewalls - pfsense
- vpn server
- redundant with hot failover
- 3 storage boxes "hyperconverged"
- 3 compute boxes
- all from Super Micro
- 3 or 4 mac minis
Proxmox is a Linux distro based on Debian that makes it
really easy to create and control lxc containers and qemu vms.
- some monitoring
- vm migration to another machine
LXC - LinuX Containers
LXC are great, they're basically virtual machines
but since they share the host kernel are very light weight and boot
in a few seconds.
We run about 35 LXC machines per DC, more or less
one service per container.
Pros over Docker is you can ssh in, permananent
storage, etc. Downside is they don't have all the tooling and
orchestration that you get with Docker.
Used to run Docker
Protip: do not run docker on physical machines
as it flakes out way too often and requires a reboot.
Do use an orchestrator like Nomad or Kubernetes
Since we're primarily a Mac shop and all the devs
despise windows here we are.
They're really old...
4th class citizen at Apple, expensive and largely
Virtual Mac Minis
As our cloud
offering got more popular and the Minis are expensive and
haven't been updated in like 10 years another solution was
Who here has heard of Netflix Chaos Monkey?
If it's stupid but it (mostly)
works is it stupid?
We're migrating our backend to Linux post haste.
I set all that up before lunch
You might be thinking, holy crap, this guy is crazy smart,
there's no way I could do all that. And you're right, you
But neither can I, and probably nobody else
Taken 1½ years to get here, still ongoing.
I'm pretty good at a few of these things but
barely competent at most.
I'm doing the job of four people. Or put another way, I'm doing ¼ the job of
This is by far the most important and
hardest part of running your DC.
Storage is hard.
A hard drive is pretty easy. Redundant storage is
harder and distributed redundant storage harder yet.
Backups... what does that even mean? I have
a limited budget. Snapshots are cool... This is not an entirely
solved problem for us.
For the record, our customers dbs are backed
up and geo distributed.
- Started with ZFS on FreeNAS
- Then ZFS on Linux (FreeNAS → Proxmox)
- Then Gluster
- Then Ceph
- Then CephFS
- Then Ceph RadosGW
- network layout, not huge, but wish there had been a plan
- machine names, boss man insists on snowflake names
for storage machines. Cattle not Pets.
- false starts are annoying but learning opportunities
- have had no major mistakes/outages (so far)
- wish we had more commodity machines
- Burn SSL certs to the ground and salt the earth
- Prod would work the same as during testing
- Dockerize all the things
- Faster networks
- Faster disks
- Unlimited budget
- Reality matches marketing
- Know how to drive all this stuff
Examples of Goodness
- Consul & Fabio
- Hashi Stack in general
- Proxmox, virtualization in general
- Open Source