"The big unsolved problem is the incredible amount of waste that goes on in IT organizations and how fundamentally unpleasant some of those jobs can be. It's all really so unnecessary."
DevOps Means Business
Nigel Kersten, CIO & VP of Operations at Puppet, was in his third year at university studying astrophysics when his job as a table waiter at a coffee shop, and a chance conversation, landed him a help desk position at one of the largest universities in Australia. While at Google, he got the attention of Puppet labs as he managed over 40,000 machines, building highly automated environments before DevOps became the poster child it is today. Nigel's desire is to help large dysfunctional IT organizations become functional.
In this Innovator’s Journey profile, we talk with Nigel about his interest in comparing SRE with DevOps, his frustrations with large, dysfunctional enterprises, and his desire to help create functional process in large organizations through collaboration and sharing.
Mark Miller: You've got a varied background with the STEM and the Arts stuff that you're doing and media arts lecturing. How did you get started?
Nigel Kersten: I, as a long standing computer nerd all through high school, grew up in a pretty rural area of Australia, sort of a redneck beach town. For some reason I decided I wasn't going to do computers when I went to university, '90-'93. I decided to go off and do astrophysics instead. I didn’t even realizing that knowing about computers was a real skill that people would pay for.
I was working in a coffee shop in third year university, and two of the IT guys were sitting there trying to work out how to free up enough space, 640k memory on a DOS machine. And I was serving them coffee. I was like, “You just need to high mem the mouse driver.” They were a little bit freaked out by this and essentially offered me a job on the help desk at $2 an hour more than I was making at the coffee shop.
I started working at the help desk at UNSW (University of New South Wales), one of the big universities in Australia. I moved to doing faculty desk top support, moved to assist admin role, did a few little contracts with people like IBM and Westfield’s in various places while I had a very long, patchy university career. I ultimately ended up as a sysadmin at the faculty of College and Fine Arts as it was called in Australia at the time.
I was also doing an awful lot of computer music stuff: sound design, sound production, as sort of my hobby and playing a lot of music gigs. One of the lecturers came and saw me play somewhere and was like 'Hey! I actually want to try and get more practitioners doing lectures here'. There's a whole chunk of software, signal processing software, reactor MSP, that's more like audio programming. So I was teaching that and STEM production for quite a few years while I was working at the college.
I've always been someone, who automates everything. I spent a lot of my time automating all of the lab management, desktops, all of the laptops, all of the servers that we had. Had enough free time that I was spending a lot of time on mailing lists and getting deeply involved with the help management community.
That seemed to be one of the best decisions I ever made because then Google approached me to ask me to move over to Mountain View and manage their Mac deployment, which was an enormous mess at the time
We were looking around for a solution, in 2007, that was flexible enough for us to do large scale desktop management. I happened to see a friend's talk at a conference that was about Puppet because Luke Kanies had demoed it at LISA a year or two before. After trying out Puppet, it was clear that was a really huge win. We had many tens of thousands of machines managed pretty quickly within a few months.
I was on the Puppet mailing lists and started asking questions about scale and suddenly, Luke's interested in, 'how big is your deployment? Where are you'?” “Well, I'm at Google. I've got 40,000 machines.” I became quite good friends with Luke because suddenly I was pushing the boundaries of scaling Puppet.
I worked at Google as a SRE for about 4 years and eventually jumped ship when Luke made me a job offer to come and be product manager at Puppet Labs. We were about 20 people at the time, nearly 6 years ago. In that time I've worked as Product Manager, a jack of all trades, doing everything. I was CTO for a year or two there. And now I am in the CIO role which is probably not the traditional CIO for a lot of companies.
I spend a fair bit of time out bound doing evangelism. I work heavily on the state of the DevOps survey that we do. I work closely to the marketing and I also manage the community and evangelism team, as well as all of the operations that IT takes in at Puppet Labs.
Mark Miller: That's quite a history. As you think back, do you remember the first time you heard about DevOps?
Nigel Kersten: I think I was still at Google at the time. I think before the name DevOps appeared there were a few people using the term agile system administration and I was following that pretty closely.
One of the odd things about a huge company like Google is that not everyone pays much attention to the outside world in a way and I think particularly in the 2000s. Google's infrastructure was just decades and decades ahead of everyone else's though people didn't seem to think there was a lot to actually learn outside. But I was paying close attention to the agile system administration forums that were popping up and then saw that first DevOps Days that happened in Ghent. Patrick Dubois and John Willis and folks like that were all appearing. So I was paying pretty close attention to it. I was quite interested and still am.
I have to set up a distinction between SRE (Site Reliability Engineering) and DevOps because they are trying to solve similar problems but at different kinds of scale and with different kinds of constraints. I was following that pretty closely and I think when I joined Puppet at the end of 2010, it was definitely something that was staring to appear everywhere.
I think the community of practitioners at the time was somewhat split on whether this was a real movement with anything concrete behind it, or if it was just going to turn out to be like Agile programming where it got co-opted by consultants and product companies pretty quickly and stopped feeling authentic and genuine.
Mark Miller: Are you still motivated by working with DevOps conceptually? Is it a fad, a trend? Where are you placing DevOps now in the whole loop of things?
Nigel Kersten: I think that's interesting. When I started talking about the DevOps movement with folks internally at Google, they just kept saying, 'Well that's just how you do operations properly.' I think that was a common sentiment amongst people for a number of years in the early days. They were like, "why do we need a label for just doing your job properly"? You should care about business value. You should work closely with the developers. You should have a highly automated environment. You should have lots of metrics. You should be able to measure things. Culture matters, sharing matters. Why is this suddenly a movement?
I think what's happened is then that we've seen a few things. We've seen dysfunctional companies use DevOps as both a cultural signifier that they willing change and as a rough sort of road map for actually improving their situation.
I have changed my mind on this a few times over the last few years. I used to think that it was ridiculous to have the idea of anyone having the title DevOps or to have a team with DevOps in the name. It just seemed stupid to me. What are you going to do? Are you going to have an operations team, a development team, and then a DevOps team? That's an anti-pattern. That isn't actually going to solve anything.
After seeing a few things that came out from the State of the DevOps survey last year and the year before, we see that companies that do have those job titles and do have teams with the name DevOps in them, do tend to be performing better than people who don't.
It bothered me for a little while because I somewhat think titles and names are little bit stupid anyway. All that matters is the work you are doing and how you do it. I've come to the conclusion that DevOps is an important cultural signifier particularly if you are in a relatively traditional enterprise environment that has generations and generations of dysfunction that's created around the way you work and the work that you do. Having the title of DevOps there is sort of the signifier that we're trying to do things differently.
I think that is important for retaining people, for hiring the right people, just sending the right signals around the organization.
Mark Miller: If you could make up your own title to tell people what you do, what would it be?
Nigel Kersten: That's a really good question. I am very cross functional here at Puppet. I work really closely with marketing, with sales, with product engineering, finance, HR, all of those groups. I don't know what I would pick.
I like having the CIO title because it's one of those fuzzy titles like CTO, where responsibilities can vary greatly. It's both a senior enough title that certain types of executives will listen to you because you still have power, but it's also fuzzy enough that I feel like I have a great deal of leeway.
Information is kind of a nice thing to be responsible for. If I was to pick anything else though it would be something around collaboration or cross-functional work. I don't quite know what title you'd end up with: Chief Collaboration Officer or something. It sounds a little bit fuzzy but is perhaps a little bit more accurate as to what my job is actually like day to day.
Mark Miller: When you think about the things that you've accomplished so far, what are you most proud of?
Nigel Kersten: I'm really proud of the state of DevOps today. I think partly because I was a little bit skeptical about it when we started. Things like, do we really need to send out a survey? This feels a bit more like a marketing exercise than anything real. I think the realty has been over the last 5 years that we've shown real impact from that survey.
The trend over the last 5 years has been really interesting. When we first did the survey it was very clear that there was a lot of awareness that this thing called DevOps was out there. People didn't actually understand particularly what it was and no one was really doing it very much in production. It was still something that small startups, traditionally in the Bay area and large web scale companies, like the Googles and the Facebooks, Drop Box, Twitter, those kind of people were doing. But it wasn't something that had really penetrated the enterprise even though there was awareness of it.
The fascinating things that I'm really proud of being involved with in that research space has been showing that people who are adopting DevOps practices are higher performing IT organizations and this has a material and statistically sound impact on how fast can you do deployments, how quickly do applications get out there to production, how quick do you recover from failures. You generally have a higher level of quality in terms of the services that you are actually delivering.
It's always been apparent to practitioners because this was a grass roots movement. It arose out of frustrations from people who were doing this job in the trenches. I'm really proud of the research we've managed to do showing to executives that there is real business value in adopting these practices.
You may feel a little uncomfortable with the fact that it's somewhat a loose collection of practices rather than something as well defined as the full ITIL specification, but it has a real impact and it impacts the bottom line and it impacts the quality of the services you are producing.
I'm really proud of that because I spend a lot of time working with the executives who are undergoing IT transformation projects. It's been a really powerful tool, being able to just point to them and go, 'look, we have actual data here from tens of thousands of practitioners we've surveyed over the last 5 years showing if you adopt these practices your business runs better'.
I think is really powerful because now we have a classic movement where executives are aware of DevOps from reading CIO magazine or whatever.
They've now got data they can actually look at and go 'this actually makes a difference to the company' and the practitioners themselves are like, 'well this actually makes a difference to my job. It's more satisfying. I have more power. I feel more integrated with the rest of the business and I'm no longer the sysadmin or IT person locked away in the basement and treated like a call center. I'm being seen as strategic to the business'.
I think this is really critical for moral, for engagement, for all sorts of things to make your life as an employee more enjoyable.
Mark Miller: The survey is coming out again soon isn't it?
Nigel Kersten: It's out now at the moment. It will be closing in the next few weeks. I'm definitely looking for more respondents as always.
Mark Miller: When you're thinking about the next generation of DevOps practitioners, where do you place them? What should they be working on if they want to get into the community?
Nigel Kersten: IIf you're new to this, you've either been a traditional sysadmin who hasn't worked in this field before or you're just entering the field. There's a couple of things I think are really key.
Make sure you have enough basic math skills to be able to deal with monitoring and metrics and draw a data analysis conclusions from them. We're talking something relatively simple here. There's a really great talk an ex co-worker of mine from Google, Jamie Wilkinson, did: "Better Living Through Statistics: How Monitoring Doesn’t Have to Suck". He did that at Puppet Con a year or two ago. That really shows how just basic high school mathematics really gives you way more insight into what your infrastructure is actually doing.
One of the things that I think is a little unfortunate with the main DevOps is that it causes some people to think 'I need to be a full developer. I need to be able to write a large, complex enterprise application. I need to be architect something that's really quite complex as a distributive application'. Whereas I think the more appropriate way to think about it is sysadmins have always had to do scripting and traditionally this is being batch scripts, tshell, all those sorts of things. But the reality we have these days, is we have whole bunch of dynamic languages: Perl, Python, Ruby, even Go, that you can do an awful lot of working and are actually much, much easier than dealing with inconsistent shell scripting environments.
So I think the three things that I would pick out are a desire to automate and to do that you're going to use a platform like Puppet or any of our competitors. Anyone who does that infrastructures code work, you're going to need to have some basic math skills. You're going to have to have some basic software development skills.
It doesn't mean you need to be a 4 year computer science degree expert programmer in order to do this work. You just need to understand how to work like a software developer.
There are some of the big things that I see. As long as you can pick up enough skills to be able to manage your infrastructure in a way that's coherent with the software development life cycle… you're doing code reviews, you have some sort of text based system, you can push out application changes at the same time as infrastructure changes and review all of them together. That feels really key to me. Basic math, basic programming, and an awareness of infrastructure as code applications.
Mark Miller: It's interesting of all the people that I have talked to, no one has mentioned math yet. That's an interesting perspective.
Nigel Kersten: I wasn't using a lot of sophisticated maths until I got to Google. I spent two years starting a pure maths degree before I decided to switch to something else so I had a pretty strong maths background. But I hadn't actually been using it very much at work, then when I got to Google where everything is measured. There's just years of data collection sitting there. All of the monitoring systems are incredibly sophisticated and require a reasonably high degree of maths. It sort of took me a month or two to go, 'Oh my God, how do I remember how to do all this stuff.'
It's really high school level maths that matter. It's basic statistics. It's basic Algebra and basic Calculus. All of those things give you a really huge leg up.
I think a really classic example here is we often, when we are doing monitoring just measure counters, or proportions. So it would be like 'alert me when 10% of disc space is available because I'm going to need to do something about that'. If you just do some basic high school math around Calculus and being able to do derivatives and integrals, you can start measuring rate of change and you can start developing much more sophisticated monitoring. So rather than alerting you when 10% of the disc is available, you alert someone in time so that given the current rate of data consumption are you going to be able to alert them in time to do something about it before the disc fills up.
Rather than just doing raw metrics and counters, being able to do simple maths in order to determine rates of change and acceleration of rates of change gives you a much more useful set of monitoring data to work off.
Mark Miller: Once you move on, what would you like your legacy to be with the community? How do you want to be perceived once you move on?
Nigel Kersten: I was a traditional sort of curmudgeonly sysadmin in a whole bunch of ways before I was forced to actually care a little bit more about business and marketing and sales and all of those things that work in a startup teaches you. I would be really happy if my legacy ended up being that I was useful in helping a number of large dysfunctional IT organizations become functional.
Mark Miller: Wow. That's big.
Nigel Kersten: That's the sort of work we are doing a lot here at Puppet. I think I say that too because I have such strong faith that we've built such a connected community in an adhoc manner on the internet and around Twitter and all of the various communication channels and DevOps Days. The practitioners are going to be fine. At the grass roots level everything's progressing, people are skilling up, more and more interesting stuff is coming out.
But I think the big unsolved problem is the incredible amount of waste that goes on in IT organizations and how fundamentally unpleasant some of those jobs can be. It's all really so unnecessary. When IT orgs move from a waterfall cell system where application developers are huffing things over the wall to the QA team who then make a few changes and huff it over the wall to the operations team. I think the potential for change in large somewhat staid enterprise environments is fair greater than it is in the sort of cutting edge start up space.
Mark Miller: “Developers are throwing the code over the wall to ops and then having ops deal with security.” Is it really still that prevalent with all that we know now?
Nigel Kersten: Absolutely.
I think the most terrible thing is that you go in and you talk to these environments and they suffer talent drain at the moment because the reality is if you've got a DevOps mindset, if you're able to do automation, if you're able to level yourself up that way, you can always find a better job than working in some shitty, unproductive enterprise environment. People that aren't undergoing that change at the moment have just seen their best people leave because they can always find a better job.
I can tell you complete horror stories, terrible things.
Here's a good example. I know Gene Kim cares a lot about the security space too. I think it in many ways, as much as I have a distaste for the term, Rugged DevOps, I think there is something real there about applying these principles to the security and compliance space as much as operations and development.
I was talking to a relatively large enterprise environment who had adopted our software. They were complaining that when they downloaded modules from the forge which were prepackaged. Someone sits down and writes a bunch of Puppet code to manage MySQL or Engine X or anything like that. They can upload and share it with the rest of the community and sharing has always been a really cool tenet of the DevOps movement.
This customer's complaining that it's taking them too long to adopt this content and the reason was they had to go through it with a fine tooth comb, remove any configuration that had anything to do with users and groups and pull that into a separate file.
I was like, "Why are you having to do this?". With a tool like Puppet, that's just unnecessary. They are like, "Oh no, that's the security requirement. The security operations team have said all users and group management must be in a single file that we monitor in terms of code." And so these poor people were having to jump through ridiculous hoops to fork over their code, lose the benefit of future software updates. All of these things because of this ridiculous archaic security process.
There are so many stories like that I can tell where there is someone in an organization refusing to actually change or re-evaluate the way they're doing things. Some of these processes built up before virtualization even existed. They are decades old and everyone is still suffering under the weight of them. There are so many dysfunctional environments.
What's another example to show how far behind the curve some people are? Relatively large bank overseas that I spoke to a while ago. They were looking to adopt virtualization. In the last four months they were finally going to adopt virtualization. I was like, "What's triggering this now?" They said, "We can't buy new servers with CD trays in it." This is 2015! Virtualization swept through the industry quite a while ago.
There are a lot of really dysfunctional organizations out there and some of them are really large.
We've had customers who, when we first started working with them, an application would finish being built internally, they had thousands of internal developers, but it would tend to be 9-18 months before that application was stable in production. After we worked with them for six months they managed to bring that down to a week. That sort of thing is just transformative within an organization. How frustrating must it be as a developer to write something and know that it's not actually being used by anyone for another 12-18 months? That's just a crappy job by anyone's standards.
I've actually been considering trying to get a panel together somewhere on the difference between DevOps and SRE. I think we've seen there's a new O'Reilly book out about Site Reliability Engineering that's really, really fascinating and a really great read. It was really good for me because it sort of refreshed me on a bunch of the things about how Google actually does work, which is radically different because they are a company that adopted many of these principles many years ago and is operating on a scale that very few other companies actually ever get to.
I'm kind of interested to see if SRE is really just DevOps post bureaucracy at scale or is there actually fundamental differences between the ways in which they work. SRE I think has a higher expectation of the programming ability of people who are in an SRE role than your average DevOps role does. But I think there's an awful lot that both sides could learn from each other.
That's a topic that I am definitely thinking about a lot at the moment. Where are those boundaries and what are the differences between the two movements.
Another topic I've actually been considering trying to get a panel together somewhere on is what is the difference between DevOps and SRE and I think we've seen there's a new O'Reilly book out about Cyber Liability Engineering that's really, really fascinating and a really great read. It was really good for me cause it sort of refreshed me on a bunch of the things about how Google actually does work. Which is radically different because they are a company that adopted many of these principles many years ago and is operating on a scale that very few other companies actually ever get to.
I'm kind of interested to see is SRE really just DevOps possible bureaucracy and scale? Or is there actually sort of fundamental differences between the ways in which they work and SRE I think has a higher expectation of the programming ability of people who are in an SRE role than your average sort of DevOps role does. But I think there's an awful lot that both sides could learn from each other. That's a topic that I am definitely thinking about a lot at the moment. Where are those boundaries and what are the differences between the two movements
Mark Miller: The idea for DevOps now is moving more towards collaboration. Are you seeing that too?
Nigel Kersten: I think the tools are always just a means to an end. One of the rather frustrating points we were at about 3 or 4 years ago where people would say, "I've adopted Puppet or I've adopted Chef. Now I'm done. I've adopted DevOps." That's really not the point.
The tools don't matter, apart from the fact that I think infrastructure is code tools like ours and even some of the container space work around Docker. They allow collaboration because they give a common language to what the actual changes in production are going to be. Rather than having this translation error off here's a document or an email where we all agree on the plan and then someone goes and executes the plan. You can actually look at the code and go, "this is what's going to happen to the infrastructure."
Tools matter but only in terms of what they enable. The ultimate goal for everyone is collaboration and sharing across all of these things. There's no point in just building more silos.