Intuit's DevSecOps: War Games, Gamification, and Culture Hacking

April 11, 2016 By Derek Weeks

16 minute read time

Wow, if you ever wanted to learn about Rugged DevOps (some call in DevSecOps), sit down for a spell with Shannon Lietz, Ian Allison, and Scott Kennedy from Intuit. We discussed many important topics, including internal war games, culture hacking, gamification of Rugged DevOps, and starting as a small team. There are 100 gold nuggets in this conversation for novices and experts alike. Just yesterday Shannon shared her story on the first stop of the Nexus World in Dallas, TX. She'll also be with us in Chicago on April 27th. To catch Shannon as a keynote on the Nexus World Tour, register here.

Derek: We're at the RSA 2016 Conference DevOps Connect Event. I have some of the Intuit DevOpsSec team here with me today. We'll talk to them a little bit about Rugged DevOps and how things work over at Intuit. Let’s start with some introductions.

Ian: I'm Ian Allison. I help run the Red Team at Intuit, which is, I guess you'd say, an interesting way to take control of security at our company. We try to get ahead of the attackers by basically being the attackers. We're essentially ethical hackers. We go after all our own stuff to ensure that we can find where the deficiencies lie in all our software.

Shannon: I'm Shannon Lietz. I've been working at Intuit three-and-a-half years and helped find the 24 x7 DevSecOps capability at Intuit, leading the Red Team, our security operations capability, our cyber SOC, and what we also consider "blue teaming:" being able to hunt for defects.

The organization has had to transform how we do software development, because we're a 30-year-old software company. We are now seeing the traditional way of putting together software that embraces DevOps. For us, it's been exciting to work with Rugged DevOps in the industry to help build security into the DevOps movement.

Scott: I'm Scott Kennedy. I run the forensics and threat intelligence part of cyber work.

Derek: Shannon (@devsecops), tell me a little bit about software supply chains and how that vision of software development has impacted the way you see things at Intuit.

Shannon: That's a great question. It was interesting when Josh Corman and I first talked, we had a lot in common. One of those things was the software supply chain. What I love about the concept is that processes can be driven in a certain way, so that you can reduce defects.

Having worked for Toyota in the past and understanding the supply chain mentality, you get a sense of how you could put something together better, incrementing on it, figuring out how to share that process, and then really figuring out what things are important. Having that notion of fewer, better suppliers was really a core concept.

I love transparency, building things in a certain way, and getting into continuous improvement. You need to look at things from an opportunities perspective -- making sure you're not just looking to make things perfect. You're looking for those opportunities to improve over time.

Screen_Shot_2016-04-06_at_10.57.12_AM.png

Derek: As we think about Rugged DevOps within your security team, how do you measure the success of what you're doing? What metrics are you looking at that matter to the business?

Shannon: We measure everything. For example, mean time to remediation (MTTR). Once somebody finds a defect, we analyze that defect from the time it got into the supply chain to when it actually gets resolved. We track everything from mean time to remediation, to when the ticket was created, to looking at when the code actually got published, to when it actually got found, and then we work on those things over time. We try to uplift.

____

Once somebody finds a defect, we analyze that defect from the time it got into the supply chain to when it actually gets resolved.

____

We leverage JIRA just like the software development team does. We register our defects and figure out how to get development teams to take responsibility for those ideas. It goes through their process of release and regression testing. As part of that, we look back to see where our opportunities are.

As an example, we started out where things may have taken weeks. We then reduced it down to days and ultimately got it down to hours. We've seen defect resolution where it's now minutes. When it's something we've discovered was just a mistake by an engineer, we realize "mistakes do happen." We found that our cycle times also help us find fault stack vulnerabilities in real time, because we get to do end-to-end testing more aggressively using this method.

Derek: How has consistency in your operations helped with Rugged DevOps and its fragility within the organization?

Ian: One of the things we do is use a golden image for all the AMIs (Amazon Machine Image) we use for all our customers, and we require everybody to use these AMIs. We've also built some interesting automation around scanning these AMIs. So one thing we realized quickly when we started native US. When we try to do full vulnerability scans against another system, if it's set up to autoscale, we suddenly have 50 systems. Right? We can't... It's really hard to do a full vulnerability scan against the system, so we came up with a way to share all the AMIs with a special account. Then we bring those up and scan them. Then we grade them.

Based upon the vulnerabilities found, you'll get a letter grade, like A through F, based upon the system you have. While we always strive to have our base image an A, and people continue to run on older images. But they get graded, and those grades get pushed up, so everybody in their org structure gets to see what the grade is for their account. I think by being a little standardized basically with these images, it lets us know what's in everything, and we have a grade for everyone. It helps everyone have a good idea of like where they stand when it comes to a security standpoint.

____

Based upon the vulnerabilities that are found, you'll get a letter grade, like A through F...so everybody in their org structure gets to see what the grade is for their account.

____

Derek: That's not only a grading, but also a policy enforcement governance role that grading plays. How rapid is the feedback loop in that grading system for the teams you're working with?

Shannon: It's really quick, and we've discovered through some science that having component based resources like AMIs provides us with an advantage when doing things like remediating vulnerabilities. Using AMI based resources, we have seen that when there's a defect in it, we can find and remediate all of the defective AMI’s quickly. That improves everyone's security across the company.

So if you're just picking out good components, keeping track of those components and adding security into them, you'll actually see a different effect across our pipeline. A single change can actually have a dramatic effect on reducing the problems within the pipeline.

Ian: It's really interesting. This morning, I got an email from somebody saying, "Why did our baseline AMI go from an A to a C today?"

We had just received notice of a new vulnerability. Our stuff caught it, we scanned it, we pushed the grade out to our portal where all our customers look at the grades. Our customers quickly saw that change.

They could now say, "Wow, it changed from an A to a C in less than 12 hours." I think the feedback is important. The other important thing is that we have people going and looking. I wouldn't get emails about why this has changed if people aren't actually looking and wanting to make their grades better.

Derek: You mentioned customers. Are these internal customers?

Ian: Internal.

Shannon: Yeah, for our development teams, we as a security team have changed how we think about things. It used to be that the security team would go out and govern. Basically, you got the fear of the security team coming in, descending upon you.

We've changed how that happens within our organization. We grade our resource components and grade the way our applications come together. That changes how developers want to operate, because they want to figure out how to get better grades in security. And it creates a learning dynamic that encourages somebody to improve continuously.

___

That changes how developers want to operate because they really want to figure out how to get better grades in security. And it creates a learning dynamic that incentivizes somebody to improve continuously.

___

Derek: Does it create a competitiveness or gamification of who has better grades?

Shannon: Absolutely, which is why we did it in the first place. To your point there, gamification is something where when you start to grade components like that, you can actually start to leverage a leaderboard concept. We do have leaderboards. We have APIs where you can actually pull down your grades and include them in your automation. With these, you can make governance decisions.

If you have that "game afoot," your leaders can then ask for specific grades within their pipeline. That up-levels the system, and you just see a continuous improvement life cycle come to bear. Ultimately, you see fewer defects, and ultimately you get to the notion of Six Sigma in our way of thinking. DevOps is about continuous improvement and automation. Embracing that concept allows us to get to fewer defects faster.

___

DevOps is really about continuous improvement and embracing automation. Embracing that concept allows us to get to fewer defects faster.

___

Derek: As you embraced continuous practices and DevOps practices, were there points when you realized that certain old ways of doing things weren’t going to enable you to move forward?

Scott: In looking at the progression of what we've been doing, one of the decisions that was made in Intuit and one of the things that I saw was really unique was the way they decided we were going to migrate into AWS. Our idea was to have the chaos team be the first people out, and that's the security team. So the security team was the one going out and finding out how to use each of the products AWS has, and creating the concept of whitelisting. Each product was rated whether it met security’s requirements.

Therefore, no team can pull down this new cool tool that AWS released yesterday and use it in production because it's not been "whitelisted." That can go into their scoring. Their scoring is not only used by the development teams, but also useful when reporting to the Board. When the board asks, "How are we doing as a company across the entire organization?" We can say that product A got a lower score than product B, and then they turn to the VP in charge of it and say, "Well, why?"

___

When the board asks, "How are we doing as a company across the entire organization?" We can say that product A got a lower score than product B, and then they turn to the VP in charge of it and say, "Well, why?"

___

We decided not to rush into the cloud, but to take a careful, considered approach and migrate in a very intelligent and well-thought-out way. At the same time, we gave the chaos team the ability to make the mistakes and grow and learn, so they can immediately turn around and share the mistakes with everyone else. They could say, "Hey, these are the things that didn't work for us. We encountered many problems, especially when you look at things like accounts and account roles."

How do you control when you have thousands of accounts and need some sort of administrative control?

You can either have a gigantic effort to force your namespace and your active directory to be the source of control. Or you can use vendor-specific tools like IAM and have each account have their own islands. With the concept of cross-account roles, you can then do remote administration from a centralized account. You have it locked-down. You can have a restricted group and remotely go in and do the necessary actions.

That also gives you an audit trail. That also gives you multifactor built-in, because the AWS products add those things to them.

https://youtu.be/kCo-D_x2dUQ

Shannon: I think culture-hacking your environment can have a profound effect, especially when you're going through a DevOps transformation.

Derek: What is culture hacking?

Shannon: That's a great question -- we use it when really trying to figure out how we as a security team can change and transform. Many things that take place in a company are based on traditional processes: What has worked before, and why would we change something that is working, right? If you're going to go into an innovative frame; if you're going to get into that next-generation innovation; if you're trying to figure out what's going to work in that... it's never going to be the thing that is working. It's going to be the thing that you'll learn as you go to that next step.

Culture hacking is about looking at the people who are operating right now and trying to figure out how you're going to help them go from A to B, making that change. What is that the experience going to be like?

What we have done, to Scott's point, is that we've forced our security team to have empathy for the DevOps teams. We go through the process of developing something in the cloud, utilizing it as a method of taking their paranoia and trying to balance the notion of getting something done within a specific time frame. We try to really wrangle what it takes to do those things securely and safely, but ultimately still deliver for the business.

I think culture hacking really comes into play when you're trying to figure out how to move somebody from the rock they're on to the rock you need them to be on, and trying to figure out what those mechanisms are.

___

Culture hacking really comes into play when you're figuring out how to move somebody from the rock they're on to the rock you need them to be on.

___

Derek: Part of your security practice is looking at open-source and third-party components and your own binaries. Can you shed some light on how Intuit uses Sonatype solutions to better manage those vulnerabilities?

Shannon: Yeah, Sonatype is a fantastic platform. We love the Nexus repositories. We love how you guys put together a community. We learn a lot.

Many of our DevOps practice is working with it. We've put together our Nexus repositories to do code signing and figure out how to secure our pipelines in a certain way. We are taking advantage of the fact that we can pick up components, track them, and then scan them [for known vulnerabilities].

That's allowed us to reduce the defect count that goes to production. Actually scanning and looking for vulnerabilities within our components and open source libraries allows us to make better decisions about what we're including in our software.

Derek: When you govern what open source, third-party or proprietary components are being used by developers is there any feedback from the teams saying, "Hey, you're restricting my behavior, not improving my innovation"?

Shannon: What we've found is that security approvals, exceptions, and gates don't work. Quite often, you just create a culture where developers will go out and do it, and then you'll find out about it. When it comes to partnering and being boundaryless about how you think about security in your business, it's all about transparency. It's all about benefits. It's creating things like a security markdown file within your repository manager. It’s about taking responsibility and accountability for the things that you're doing from a security perspective in your development process. It’s ultimately having an attacks.md file, keeping track of what's out there, keeping track of your open source, understanding what components you're leveraging, and why you made the decisions you made to bring those things into your project.

___

It's about taking responsibility and accountability for the things that you're doing from security perspective in your development process.

___

At a top level, all those things work. But having tools that can help the decisions made by other open source programmers that you're getting contributions from is necessary. All the things that they might decide are also part of your decision tree, and ultimately you're rolling all that and bundling it together. The attack surface is not just the decisions your team is making, but also the ones you share across the code base you've got.

Derek: Your practices are very mature. You've clearly developed them over a long time, and some people watching this might think, "Well, Intuit's a huge organization," and it may be daunting to them if they haven't started down the path of Rugged DevOps. Can you be a small team and have success in these practices?

Shannon: We're not exactly a huge organization, but we are relatively large now. When we got started, I believe I was one of maybe three people who started this, only a couple years ago. We have extensively hired into our group to help grow it, and some of the things we've done have allowed us to operate differently, to bring in people and have them immediately be successful. Our practices allow someone Day One to work with the environment, develop code, and contribute code that week.

We do weekly demos, where we actually do video demos. A person must come in, program something, secure something, operate it, and create a demo, all within their first week. So having the right bar for those folks is important, but more importantly, our Red Team leader here (points to Ian), he came in and is amazing, has created a Red Team pretty much out of thin air. So is having somebody from forensics, who's done an incredible job to help us, to make it so that we have a life cycle where we can snapshot something and learn from it when it's actually offline.

___

A person has to come in, program something, secure something, operate it, and create a demo, all within their first week.

___

Those are the types of practices where you start to extend yourself past the normal baseline practices of processes today, and think past that about how you're going to support innovation. You get into it quickly. You get a learning culture. You get people who know that making mistakes, and learning from them, is okay. That's a really important of that actual culture that you're putting in place.

Ian: Yeah, I was going to say, it's all about iteration, right? We started small, and we continually iterate on what we're doing to get better and be better at what we all do.

When I started this journey, I was a security guy - a pen tester. It was always the developer’s fault. Developers always made the mistakes. I always had to clean up after them. But after six months of developing Ruby APIs and Ruby and working my butt off in code, the empathy was there.

I understand what the developers are going through and why they make the choices they do. But I think by allowing us to help them, by creating tooling that allows them to self-serve, understand it without making them... helps them make themselves more secure without them having to become a security professional. I think that's our ultimate goal.

Shannon: Being friendly hackers, right? Basically, attacking them so their applications don't get attacked by external attackers is part of that frame.

Scott: The Red Team shift at the company has been profound, because you see how people react. When the Red Team started, it was not as well shared, and many suddenly were upset that the Red Team attacked them. But when it was pointed out, "well, what would you have happened? Would you rather have somebody in China do this to you that didn't work with you, didn't sit next to you and help you fix the product, or would you like a friend who, by the way, their job is to attack?"

When we went through several drills and actually practiced the muscle of defending the company against an attack, and people were upset. "Oh, I had to do all this work."

My response to them: "Well, you did the right work."

Scott: "You did the right thing. You saw something bad. You did it. You did good. You practiced the muscle. Now when it happens again and it's not the Red Team, I know you'll know what to do. You know that the process works, and we can actually defend the company faster and more securely."

___

You know that the process works, and we can actually defend the company faster and more securely.

___

Derek: Yeah. That's an incredible story. Thank you for sharing it.

My final question: If you could pick a superpower in dev, security, or ops that you would have in the organization, what would it be?

Ian: To me, they're all like, they're the same, right? That's what we do, DevSecOps, right? We try not to actually separate them out, because I think once you start to separate them out, you start to lose perspective.

Scott: Yep.

Ian: There's a good thing about having them all be one thing, so I'd choose them all.

Scott: It's been pretty consistent. DevSecOps is the answer. What was the question? (Laughter)

Shannon: I think the reason we created DevSecOps was simply to change how we thought about developing and technology, and to get ahead of it, to realize that attackers weren't setting up appointments or meetings to help you figure out how they were going to attack your software, and so then why were we? Why were we operating at a fragile level?

I think the superpower I would like is DevSecOps, because I know that we are going through the process of creating a less-fragile security capability that will allow us to get ahead of attackers, make it much harder for them to go after the software that gets built, and we're seeing those improvements. That's actually a great thing.

Derek: It sounds really exciting, and it's very cool, so thank you all very much. I appreciate it.

All: Thank you!

If you loved this interview and are looking for more great stuff on Rugged DevOps, I invite you to download this awesome research paper from Amy DeMartine at Forrester, The 7 Habits of Rugged DevOps."

As Amy notes, "DevOps practices can only increase speed and quality up to a point without security and risk (S&R) pros' expertise. Old application security practices hinder speedy releases, and security vulnerabilities represent defects that can leave a company open to cyberattacks. But DevOps practitioners can leap forward with both increased speed and quality by including S&R pros in DevOps feedback loops and including security practices in the automated life cycle. These new practices are called Rugged DevOps."

Written by Derek Weeks

Derek serves as vice president and DevOps advocate at Sonatype and is the co-founder of All Day DevOps -- an online community of 65,000 IT professionals.

Explore All Posts by Derek Weeks

Intuit's DevSecOps: War Games, Gamification, and Culture Hacking

Code 3x Faster with Less False Positives

Related Resources

Safeguarding the Software Supply Chain: Best Practices for Nexus Repository

AgentOps Is Here: What DevSecOps Leaders Need to Do Now

Sonatype Named DevOps Dozen Winner for Best DevSecOps Solution