Disney’s DevOps Journey: A DevOps Enterprise Summit Reprise

by Aliza Earnshaw|24 February 2015

Of course you know The Walt Disney Company. It’s one of the world’s largest media companies, and has created some of the world’s most loved and respected brands.

From its earliest days, Disney has treated technology as an important strategic asset. As it moved online, the company recognized that embracing leading-edge technology could give its customers — or “guests” in the Disney parlance — a more delightful interactive experience, and deepen their connection to the Disney brand.

Jason Cox, director of systems engineering at Disney, delivered a talk at the DevOps Enterprise Summit in October last year about the company’s DevOps journey. People loved the talk, and for those of us who didn’t get to see it, Jason discussed the material he covered at DOES.

@jasonacox love when presenters share stories and then provide a bibliography – amazing Disney presentation, great aesthetics #DOES14— Eric J. Carlson (@vABSTRACTED) October 21, 2014

Just like many other IT organizations supporting increasingly agile and high-velocity businesses, Disney’s technical operations teams strive — and sometimes struggle — to provide the speed, scale and consistent repeatability required to continually provide guests with the kind of experience they’ve come to expect from Disney.

Jason’s talk offered a view into how Disney first adopted DevOps practices to realize its business goals, and how its embrace of automation, cross-team collaboration, lean methodologies and data-driven business intelligence has evolved into a Disney way of doing DevOps. Starting with some history — how the digital online operations team became “systems engineers” embedded with development teams — Jason moved on to talk about how the systems engineers’ promotion of automation tools and collaborative processes has helped to transform and accelerate every business at Disney.

This is all nicely summed up in a slide Jason displayed during the talk. It shows some important business milestones on Disney’s DevOps journey. (We’ve reproduced the information in a more readable form below the image of the slide.)

#DOES14 Disney success stories are what convinces management. Also love results of embedding devs in the business. pic.twitter.com/Sx4UZqBcdw— adrian cockcroft (@adrianco) October 21, 2014

Business unitServersAppsSuccess Stories
ABC1536ABC, Watch ABC, ABC Family, Oscars, WatchDisneyChannel, WatchDisneyJunior, WatchDisneyXD, Dancing with the StarsUsing auto-scale, 300-700 servers within 15-30 minutes
Corp. Social Business100Employee ToolsNew instances from days to 5-10 minutes
Digital Media Agency200DisneySpeakerSeries.com, DisneyCreativeLab.com, DisneyRewards.com, DisneyAtoZDeployment times reduced from several hours to a few minutes
Disney Interactive1119Infinity, Disney.com, Starwars.com, Marvel XPNo configuration of human error incidents in FY13
ESPN830Fantasy Football, March MadnessReduction of manual steps by 75%
Guest Data Services450Disney ID, API MgmtDisneyID build out went from 1 day to 1 minute. 95% reduction in human errors
Parks & Resorts Online1742My Disney ExperienceDR Test – Deployed entire stack in 5 minutes to 200 servers in DR data center
Studios287Disney Movies Anywhere, Digital Copy Plus, Multiple Movie SitesDeployment times reduced from several hours to a few minutes
Studios Business100El Capitan Ticketing, Business SystemsAutomation of Windows, IIS/.NET and MSSQL

Why Disney Turned to DevOps

Over the past few years, leaders of Disney’s various business segments have been pushing their teams to achieve greater speed and stability — at the same time. That’s not a trivial thing to achieve, and it’s critical to Disney’s business, which is based around the core value of creating a phenomenal guest experience.

It was helpful that around the same time, some of the business units had big initiatives for which they needed extra IT support, making them far more interested in pulling IT operations people into early planning and design conversations.

Deliver new product in two weeks? In the old day two weeks was how long it took us to get a server purchased! @jasonacox #does14— Jen Krieger (@mrry550) October 21, 2014

From Operators to Systems Engineers

When Jason joined the Walt Disney Internet Group in 2005 as part of the web operations team, there was near-complete separation between the development and operations teams. Dev teams wrote code, and the ops team deployed it. Though developers wrote documentation for how to deploy their code, they didn’t actually know what happened in operations. Naturally enough, the web ops team found that the documentation didn’t fit with its processes, resulting in frustration for both teams. “I thought, why can’t we go explain our space to them, and understand more about their space?” Jason said.

Ops/dev divide: “I was there a year before I met a developer…turns out they write more code than docs. Who knew?” @jasonacox #DOES14— ElisabethHendrickson (@testobsessed) October 21, 2014

In 2007, Jason’s group changed its name from “Web Operations” to “Systems Engineering.” It sounds like a superficial change, but it wasn’t; the renaming made the team see themselves and their roles differently — and it changed their colleagues’ view of the team, too. “The software engineers now saw us as kindred engineers,” Jason said. “We got more collaboration, and more sense of what the software engineers did. We saw more of their space and their pains, and what it takes to write apps. We were able to tell them about resiliency, and what it takes to run apps.”

The new empathy between developers and the systems engineers changed how they worked together. They began to collaborate early in the software development cycle, which allowed the systems engineers to make the business case for traditionally non-functional requirements — for example, stability. That was important; as Jason put it, “The usual non-functional requirements, like stability, really are functional requirements at Disney. Stability is part of the guest experience.”

This deeper and earlier collaboration developed organically out of building relationships across teams, Jason said, pointing out that relationship-building is a key requirement for DevOps. The new model of collaboration began to spread organically to other groups throughout Disney.

An important ingredient in Disney’s DevOps journey was the decentralization of systems engineering. Like most ops teams, the systems engineers were a small group providing services to other technical teams. The company decided early on to embed these systems engineers with product-focused groups that included software engineering, QA and test teams at Disney locations around the world. This shift was critically important, Jason said: It promoted ongoing, daily collaboration in a way that wasn’t possible when the operations team worked from a single location.

.@jasonacox: “In our annual scorecard, decisive finding: ‘we love those embedded [ops] teams; we want more of those'” #does14— Gene Kim (@RealGeneKim) October 21, 2014

Automation and Beyond

It takes more than collaboration, though, to deliver new products and services at a faster pace. You also need a toolchain that enables a more agile approach. A DevOps toolchain normally includes a version control system, a continuous integration tool, a configuration management platform and monitoring tools. Reporting that everyone can see (and understand) is critical, so dashboards are key, too.

Choosing the right tools matters. “Tools change the human experience; they change how we think,” Jason said. “The way we keep elevating as a human race is that we keep building tools.”

Configuration management or continuous integration are both good choices for an initial new tool, Jason said. Both can offer early evidence that things are improving, and being able to see improvements quickly is an important part of getting people on board with the change.

Disney found that introducing configuration management tools — specifically, Puppet Enterprise and Chef — plus cloud hosting and infrastructure automation, made a huge shift in how the teams viewed infrastructure management. The perspective changed from “How do I manage scale?” to “How do I look at infrastructure as code?” Now infrastructure was not about a bunch of machines to log into, but about code that could be written to provide scale, agility and the ability to scope solutions. Even better, that infrastructure code could be married to application code, and all managed from a single platform.

“Now we don’t look at applications as a thing that runs on infrastructure — we look at applications as the entire stack,” Jason said. “You think of the application as all the things in one box, including the security paradigm, the application code itself, the storage, monitoring framework, and data framework. That all becomes one encapsulation of what’s required to make the application live and breathe, and you can define it all in code. That’s one of the big things configuration management has brought us.”

Transformative Technology – Pick Tools that Transform. … tools that change the way you think. @jasonacox #DOES14— j:hand (@jasonhand) October 21, 2014

There were other advantages to introducing configuration management early. It provided consistency, allowing the teams to make sure that code written on developers’ laptops and tested in QA actually ran as expected in production. It allowed systems engineers to spread application loads horizontally very quickly, and stand up environments quickly in different hosting environments — all in the proper desired state.

“‘Configuration management’ is really just code for ‘automation,’” Jason said. “You hand cycles back to engineers who were doing a lot of manual work, and now those cycles are invested into the future. From the ops standpoint, you’re moving from the firefighting role to the fire marshal’s role. You’re able to do work for the future, because now you have the cycles available for that.”

Overcoming Resistance to Change

It wasn’t easy to adopt new tools and a new mindset. “You have battles at the earliest stages, and the biggest battle is the battle of the mind,” Jason said. “Some of the biggest opponents were ourselves. We had to go through the paradigm shift in the minds of engineers, even in my team of systems engineers. It was super critical that we had hired smart creatives, systems engineers who brought tools with them, and brought curiosity about the next great tool.”

There was also some resistance to the idea of embedding systems engineers with product-focused teams. Some thought it would prove expensive and would make communication between the systems engineers themselves more difficult.

It’s true that the different teams ended up with “a lot of drift” between them, said Jason. Systems engineers naturally had to tune their processes to meet the business needs of each team. “You do look for enterprise-relevant processes for standing up whatever infrastructure you’re going to need, but you also know each segment is a unique and different animal that will need separate care and feeding,” Jason said.

Addressing the problem of drift, and containing it, is the responsibility of management, Jason said. Disney created forums — a DevOps summit, a leadership summit and workshops — where people from different teams work together and learn standard approaches and processes. That helps systems engineers, and others, on disparate teams stay aligned, and averts the problems that would result if processes were allowed to continue drifting apart.

In fact, the systems engineers themselves have created new cross-enterprise programs at Disney. “Smart creative systems engineers collaboratively develop programs that are good for the entire enterprise,” Jason said. “My organization helps facilitate these, and we get the other business units and funding to empower them to move forward.”

You’ve got to empower people. its people who come up with the great ideas, build the great ideas & make them a reality @jasonacox #DOES14— Em Campbell-Pretty (@PrettyAgile) October 21, 2014

How Company Culture and DevOps Culture Work Together

“Our brand of DevOps here at Disney meshes with the collaborative culture of the company,” Jason said. “The Disney culture is all about candor, collaboration, creative challenges, and courage to move the needle. It’s about initiating new concepts, new ideas, and new compelling stories we want to tell.“

The journey of collaboration — from cross-functional collaboration, to cross-team to across the entire enterprise — has grown out of the Disney value of challenging the status quo. If you don’t challenge it, Jason said, “you can fall right back in the rut. Driving in the ruts is easier than driving out of the ruts, because it’s bumpy, but I ask my teams to do that, and my leadership asks me to do that: be curious and promote positive disruption. We have to promote positive disruption, so our business doesn’t get stuck, and can move into the future.”

Join the rebellion! .. and fight against the static organizations. A red tape removal crew @jasonacox #DOES14 pic.twitter.com/2ywe1gpSyo— j:hand (@jasonhand) October 21, 2014

The DevOps journey at Disney doesn’t have a final destination. Each business unit is at a different stage of treating infrastructure as code, but at the very least, every team now understands why it matters, and is working towards deeper adoption of that approach.

“We have teams all around thinking about what the next phase of applications and hosting will look like,” Jason said. “We’re running proofs of concept all the time, to explore and to get ready for the next thing.”

Aliza Earnshaw is managing editor at Puppet Labs.