Our Private Cloud

A tale of two DC

Polyglot YEG 2019

by Kurt Neufeld

About Me

  • Programmed a bunch in C++
  • Then programmed a bunch in Python
  • Been running Linux at home and work for 20+ years
  • Been with Artsman for about a 1½ years in a DevOps/Ops/SysAdmin role
  • Expert copy-paster from StackOverflow, I guess you could say I'm full stack

Salient Points

  1. Our Architecture
  2. The Pieces
  3. Tears and Sadness
  4. Hope?

Artsman

  • Sells ticketing software
  • Desktop app (Omnis) for native OSX & Windows
  • Postgres holds state
  • Webapp for ticket sales
  • Offer a hosted (SaaS) solution

Modern Stack

Applications Theatre Manager, etc
Container Docker, LXC
OS Linux, no other viable options
Machine Physical, Amazon, Google, VMWare

Artsman Stack

Artsman Stack

Apps LXC moving towards Docker
Orchestration Nomad; Kubernetes (not Rancher)
LXC or VM apps in LXC, Docker in VMs
OS Proxmox; VMWare, Amazon
Machines 2 datacenters, east and west

Firewalls - pfsense

  • router
  • firewall
  • vpn server
    • employees
    • point to point
  • redundant with hot failover

Physical Machines

  • 3 storage boxes "hyperconverged"
  • 3 compute boxes
  • all from Super Micro
  • 3 or 4 mac minis

Proxmox

Proxmox is a Linux distro based on Debian that makes it really easy to create and control lxc containers and qemu vms.

  • some monitoring
  • storage: zfs, gluster, ceph
  • vm migration to another machine
  • HA

LXC - LinuX Containers

LXC are great, they're basically virtual machines but since they share the host kernel are very light weight and boot in a few seconds.

We run about 35 LXC machines per DC, more or less one service per container.

Pros over Docker is you can ssh in, permananent storage, etc. Downside is they don't have all the tooling and orchestration that you get with Docker.

Virtual Machines

Used to run Docker

Protip: do not run docker on physical machines as it flakes out way too often and requires a reboot.

Do use an orchestrator like Nomad or Kubernetes

Mac Minis

Since we're primarily a Mac shop and all the devs despise windows here we are.

They're really old...

4th class citizen at Apple, expensive and largely underwhelming.

Virtual Mac Minis

As our cloud offering got more popular and the Minis are expensive and haven't been updated in like 10 years another solution was required.

Who here has heard of Netflix Chaos Monkey?

If it's stupid but it (mostly) works is it stupid?

We're migrating our backend to Linux post haste.

The Stack

I set all that up before lunch

You might be thinking, holy crap, this guy is crazy smart, there's no way I could do all that. And you're right, you can't.

But neither can I, and probably nobody else either.

Taken 1½ years to get here, still ongoing.

I'm pretty good at a few of these things but barely competent at most.

I'm doing the job of four people. Or put another way, I'm doing ¼ the job of somebody competent.

Storage

This is by far the most important and hardest part of running your DC.

Storage is hard.

A hard drive is pretty easy. Redundant storage is harder and distributed redundant storage harder yet.

Backups... what does that even mean? I have a limited budget. Snapshots are cool... This is not an entirely solved problem for us.

For the record, our customers dbs are backed up and geo distributed.

Storage

  1. Started with ZFS on FreeNAS
  2. Then ZFS on Linux (FreeNAS → Proxmox)
  3. Then Gluster
  4. Then Ceph
  5. Then CephFS
  6. Then Ceph RadosGW

Regrets

  • network layout, not huge, but wish there had been a plan
  • machine names, boss man insists on snowflake names for storage machines. Cattle not Pets.
  • false starts are annoying but learning opportunities
  • have had no major mistakes/outages (so far)
  • wish we had more commodity machines

Magic Wand

  • Burn SSL certs to the ground and salt the earth
  • Prod would work the same as during testing
  • Dockerize all the things
  • Faster networks
  • Faster disks
  • Unlimited budget
  • Reality matches marketing
  • Know how to drive all this stuff

Examples of Goodness

  • Consul & Fabio
  • Hashi Stack in general
  • Proxmox, virtualization in general
  • Open Source

East Datacenter

The End

questions?

demo?