It’s become failrly well understood that “Dev” and “Ops” are no longer separate skill sets and are combined into a role called “DevOps”. This role has become one of the hottest and hardest to fill.
At Yieldbot we’ve taken a pretty hardcore approach to putting together Dev and Ops into DevOps that serves us well and should be a great repeatable pattern.
Chef + AWS Consolidated Billing
The underlying philosophy we have is that the development environment should match as closely as possible the production environment. When you’re building an analytics and ad serving product with a worldwide distributed footprint that can be a challenge.
Our first building block is the use of Chef (and on top of that ClusterChef, which is now Ironfan). Using these tools we’ve fully defined each role of the servers in a given region (by defining as a cluster), and all of the services that they run. We coordinate deploys through our Chef server with knife commands, and Chef controls everything from the OS packages that get installed, to the configuration of application settings, to the configuration of DNS names, etc.
The second building block is that every developer at Yieldbot gets their own AWS account as a sandbox. We use the AWS “Consolidated Billing” feature to bring the billing all under our production account. This lets us see a breakdown of everybody’s charges and means we get one single bill to pay.
The last detail is that every developer uses a unique suffix that is used to make resource references unique when global uniqueness is necessary. This is mostly used for resolving S3 bucket names. For any S3 bucket we have in production such as “foo.bar”, the developer will have an equivalent bucket named “foo.bar.<developer>”.
Doing Two Things at Once
With all of that as the status quo, developers are almost always doing two things: developing/testing (the Dev), and learning/practicing how the platform is managed in production (the Ops).
Everyone has their own Chef server, which is interacted with the same way that the production Chef server is. As they deploy the code they are working on into their own working environment, they’re learning/doing exactly what they would do in production.
All of this was put in place over the last year while the developement team was static, during which time we switched from Puppet to Chef.
But the power of this approach really hit home recently as we’ve started to add more people to the team. The first thing a new hire does is go through our process of getting their development environment set up. There’s still bumps along the way, and they get problems and take part in ironing them out. The great thing about this approach though is that each bump is a lesson about how the production environment works and a lesson in problem solving in that environment.
Having said all that, there are a couple differences that we’ve put in place between consciously development and production, with the driving force being cost.
The instances are generally sized smaller, since the scale needed for production is much greater. Amazon’s recent addition for support of 64-bit on the m1.small was a great help.
We use several databases (a mix of MongoDB, Redis, and an internally developed DB tech) that are distributed on different machines in production that we collapse together onto a single instance with a special role called “devdb”.
We’ll have to have some future blog posts about how we import subsets of production data into development for testing, and the like.
We also use Chef with ClusterChef/Ironfan for managing the lifecycle of our dynamic Hadoop clusters. Yet another good topic for a post all its own.
Have experience with a similar approach or ideas about how to make it even better? We want to hear about it.