Telephone +44(0)1524 64544
Email: info@shadowcat.co.uk

yapcna-2013 - alligator

Sat Dec 22 00:30:00 2012

Slides for the talk alligator at yapcna-2013

-

Architecture
Automation:
One Alligator
At A Time

-

Alligators?

-

"When you're up to your
ass in alligators it's
hard to remember you
were supposed to be
draining the swamp"

-

Or: When you're constantly
fighting fires it's hard
to find time to fix the
underlying problems

-

Alligators
are a better
metaphor

-

Shadowcat
consults

-

Early stage
startups

-

Technical
debt

-

... beats
running out
of money

-

Product
market
fit

-

"oh shit
this is
a mess"

-

... a mess
with happy
customers

-

How do you
fix it?

-

How do you
fix it while
still adding
features?

-

Easy!
(sort of)

-

Step 0

-

What servers
do we
even have?

-

Seriously.

-

Your datacentre
should be able
to list them.

-

Your cloud/VPS
provider will
have an API.

-

Start from
there.

-

There's always
one you forgot
about.

-

... and it's
probably a
SPOF!

-

Step 1

-

What the
hell are
we running?

-

(bonus points
if the guy
who knew left)

-

Documentation!

-

Do you trust the
documentation?

-

Don't.

-

Your systems
know what
they're running

-

What's
installed?

-

dpkg can
tell you

-

rpm can
tell you

-

Your OS can
tell you

-

Ask it.

-

What about
custom code?

-

Repositories
local::lib dirs

-

Repositories

-

locate .git

-

Ok, so where
did it come
from?

-

.git/config

-

.git/config
git remote -v

-

Is it the
real thing?

-

git status
git log
git diff

-

local::lib

-

... ah.

-

perllocal.pod

-

... Module::Build
doesn't write it

-

Uh ...

-

Tim Bunce to
the rescue!

-

Dist::Surveyor

-

Warning: does
clever things.

-

Warning: does
clever things.
This takes
a while ...

-

So now you
know where
your code is.

-

... what
talks to what?

-

*argh*

-

Step 2

-

Enumerate
running services

-

/etc/init.d

-

Well, yeah
but ...

-

daemontools
runit
ubic
...

-

ps ax

-

No,
really.

-

ps ax
lsof

-

ps ax
lsof
netstat

-

ps ax
lsof
netstat
(also /proc)

-

All daemons
All files
All connections

-

Now you can
cross
reference

-

Dump the
output into
a wiki page

-

Easy viewing
Free history

-

Mediawiki API
works fine

-

Dump the output
into a git repo

-

JSON::Diffable

-

diff
log
blame

-

grep out
things you
recognise

-

Work out
what the
rest is

-

Repeat.

-

Repeat.
Repeatedly.

-

So now we know
what talks to
what, and why.

-

One more
thing.

-

grep. everything.
for IP addresses.

-

There will
be one
somewhere.

-

No, really.
There will.

-

Step 3

-

Go find a
beer to
cry into.

-

-

"When you're up to your
ass in alligators it's
hard to remember you
were supposed to be
draining the swamp"

-

Or: When you're constantly
fighting fires it's hard
to find time to fix the
underlying problems

-

Alligators
are a better
metaphor

-

Wild Bill
Walton

-

Mad
texan.

-

Master of
the folksy
metaphor

-

(on sales guys
managing techs)

-

"When it comes to
technical management,
that man couldn't find
his ass with both hands
and a hunting dog"

-

(reminding me that
he -is- a technical
manager and I don't
need to use small words)

-

"I get it, Matt,
this ain't my
first rodeo"

-

Why am I talking
about this?

-

Because these
metaphors worked
for techs and
managers

-

"This is a swamp"
versus
"We have some
technical debt"

-

Guess which one
sticks in the
listener's mind?

-

"When you're up to your
ass in alligators it's
hard to remember you
were supposed to be
draining the swamp"

-

So, thanks
to Wild Bill.

-

Wild Bill
Walton
R.I.P.

-

This talk's
for you.

-

-

So, what
do we
know?

-

Systems
Packages
Code
Services
Dependencies

-

Now we
can plan.

-

First
thing.

-

If you can,
use fresh
machines.

-

Your existing
systems -will-
be missing
security fixes.

-

Assume
the worst.

-

Fresh installs
are controlled,
known installs.

-

One alligator
at a time

-

One service
at a time

-

Firewalls
aren't just
for security

-

Firewalls
keep your
dependencies
honest

-

Automation
approaches

-

Pick something
pull based.

-

I don't really
care what.

-

Sysadmins seem
to prefer puppet

-

Developers seem
to prefer chef

-

For the basics,
they're largely
equivalent.

-

Just pick one!

-

Pull based.

-

Why?

-

Pull based
systems
converge

-

System down
when an update
goes out.

-

Network blip
when an update
goes out.

-

System overloaded
when an update
goes out.

-

New system
added to a
cluster

-

All these matter
when you push

-

None of these
matter when
you pull.

-

Pick something
pull based!

-

Config
generation

-

Your tool
can probably
template things.

-

Your tool
can also
call scripts.

-

If you already
know TT ...
just use TT.

-

Rule of
thumb.

-

Don't be
clever.

-

"This is systems.
You are trying to
be clever. Stop."

-

Step 0

-

Eliminate any
IP based
configuration

-

I don't care
if you do it
manually.

-

Just make
sure you
do it.

-

DNS is a
mess?

-

Fine.

-

rsync
/etc/hosts

-

Really. It's
not clever
but it works.

-

Step 1

-

Backup
everything

-

"But everything
must already be
backed up"

-

HAHAHAHAHAHA

-

Check.

-

Step 2

-

Build new
machines and
restore backup
data onto them

-

(now you've
tested your
backups :)

-

Point a development
machine at the
new systems

-

Change something.
Check the slaves.

-

Concept
proven.

-

Step 3

-

Migration
strategy

-

Customer
facing
service?

-

Probably
HTTP then?

-

Don't trust
DNS timeouts.

-

www2

-

www.myservice.com
www2.myservice.com

-

Redirect
www2 -> www

-

Wait a day
or two.

-

Redirect
www -> www2

-

If it catches
fire, back it
back out!

-

Wait a day
or two.

-

Still not
on fire?

-

Change www
DNS entry

-

Wait a day
or two.

-

Redirect
www2 -> www

-

Guess what?

-

... wait a
day or two.

-

Done!

-

Yes this
is boring.

-

This is systems.
Boring is GOOD.

-

Internal
services

-

Now you can
trust DNS

-

... but it's
stateful.

-

Most of them
do master/slave

-

Some of them
do clusters

-

Sometimes
this is fine.

-

Sometimes
this is too
clever.

-

Here's the
stupid way.

-

rsync

-

rsync
rsync
rsync

-

rsync
rsync
rsync
(halt?)

-

Actually ...

-

Once the
rsync is
under 5s

-

Stop services.

-

Stop services.
Stop dependencies.

-

Stop services.
Stop dependencies.
Change DNS.
rsync once more.

-

Stop services.
Stop dependencies.
Change DNS.
rsync once more.
Start services.
Start dependencies.

-

Done!

-

Sound kinda
horrible?

-

It's entirely
brute force.

-

It's entirely
PREDICTABLE.

-

And your outage
window is short.

-

Clever cluster
and slave
trickery

-

Can be zero
outage

-

... can go
horribly
wrong.

-

Pick your
poison.

-

Step 4

-

Go find a beer
to not cry into

-

Decide which
service will
be next.

-

Repeat.

-

This is not
rocket
surgery.

-

Keep it simple.

-

Keep it simple.
Keep it stupid.

-

Keep it simple.
Keep it stupid.
One alligator
at a time.
---- 
Thank You
IRC:mst
mst@shadowcat.co.uk
@shadowcat_mst