Planning for the worstshit happens

@r4mnes & @ultrabugNumberly Eric Fischer / Flickr

nginxFlask appmongoDB

nginxFlask appmongoDB

I see dead backends Burning server Replica set (master / backup failover) No more. RAM (kill on consumption threshold, cgroups) Disk (RAID, distributed FS) Server overload monitoring more servers (horizontal scaling)

nginxFlask appAnotherpossibilitymongoDB

Unreachable backends SysAdmin guy tripped over the cables Hello Kitty forfeit Switch failure Network bonding / LACP

Fail proof stack & codenginx Handle backend HTTP errors Serve from cache on upstream HTTP errorFlask app Stale cache Spooling / task deferral / message queuing


load balancernginxnginxFlask appFlask appmongoDBmastermongoDBslave

load balancernginxnginxFlask appFlask appmongoDBmastermongoDBmaster

load balancer(s)There’s still aSPOF here :)nginxnginxFlask appFlask appmongoDBmastermongoDBslave

Okay. So, what if your DATACENTER burns?Ops Multiple datacenters / availability zonesRemote backups (test them)IP routing / connectivity Multiple datacenter BGP / AnycastDNS health checking (route53)Application design Geo distributed apps

Real worldproblems

Real world problem #1Hey ramnes, the client says hecan’t authenticate on the website!Something’s wrong!That sounds bad.Let me check logs Well, the client is wrongOh, okay *goes away*(Great)[email protected] ssh webserver*types stuff**much busy*[email protected] cat th/auth/auth/auth/auth/auth/auth/auth

[email protected] cat app.pyReal world problem #1Ramnes, something’s reallywrong! The client still can’tconnect!Alright.Let me check code [email protected](“/auth”)def auth():“““Old code.*types stuff**much busy*:author: Someone who left thecompany two years ago.”””.try:user.authenticate()except Exception as e:try:send email(e)return 500, “ERROR!”except:passreturn 200, “OK”That function raises an Exception if the mail server is down.

conclusionsReal world problem #11. Know your code, refactorize when needed(even if someone else wrote it and that you don’t like his coding style)2. “Errors should never pass silently”(Zen of Python)PS: Don’t always blame ops guys.The DevOps thing is great, you should try it.

Real world problem #2Weird graph showing an abnormallyhigh maximum processing time

Real world problem #2And then one day

solutionReal world problem #2Local DNS [email protected] cat /etc/hosts192.168.12.40 database-server-1192.168.12.41 database-server-2192.168.24.30 database-server-3192.168.24.31 database-server-4So it doesn’t overload your DNS server when your codetries to access your database with its domain name

load balancer. mongoDB master nginx Flask app mongoDB master nginx Flask app load balancer. mongoDB master nginx Flask app mongoDB slave nginx Flask app load balancer(s) There’s still a . (Zen of Python) PS: Don’t always blame ops guys. The DevOps