Grid Bugs

Oh, no! We were several hours into a major system outage and there was still no clue as to what was broken. The webservers were running at full load and the applications were pumping a constant stream of error logs to disk. Systems and application engineers were frantically looking through the dizzying logs for clues as to the cause. Of course, looking at the logs, you would assume everything was broken, and it was. But even when the application worked, the logs were full of indecipherable errors. Everyone knew that most of the “errors” in the logs weren’t really errors, but untidy notices that developers had created long ago as part of a debugging exercise. As one engineer observed in some degree of frustration, “It’s like the log file that cried wolf!” After a while, nobody notices the errors.

The teams restarted services, rebooted systems, stopped and restarted load balancers. Nothing helped. Network engineers dug into the configuration of the routers and switches to make sure nothing was amiss. Except for the occasional keyboard typing sounds, dogs barking or children crying in the background, the intense investigation had produced an uncanny silence on the call. Operation center specialists were quickly crafting their communication updates and were discussing with the incident commander on how to update their many clients that were impacted by this outage. Company leaders and members of the board of directors were calling in to get updates. Stress was high. Would we ever find the cause or should we just shut down the company now and start over? Fatigue was setting in. Tempers were starting to show. Discussion ensued on the conference call to explore all mitigation options and next steps.

“I found it!” The discussion on the call stopped. Everyone perked up, anxious to hear the discovery. “What did you find?” the commander asked in a hopeful way. The giddy engineer took center stage on the call, eager to tell the news. “It’s the inventory service! The server at the fulfillment center seems to be intermittently timing out. Transactions are getting stuck in the queue.” The engineer paused, clearly typing away at some commands on his computer. “I think we have a routing problem. I try to trace it but it seems to bounce around and disappear. Sometimes it works, but to complete the transaction, multiple calls are required and too many of them are failing. I’m chatting with the fulfillment center and they report the inventory system is running.”

The engineer sent the traceroute to the network engineer who started investigating and then asked, “Can you send me the list of all the addresses used by the inventory system?”  After some back and forth, the conclusion came, “I found the problem! There are two paths to the fulfillment center, one of which goes through another datacenter. That datacenter link looks up but it is clearly not passing traffic.” After more typing, the conclusion, “Ah, it seems the telco made a routing change. I’m getting them to reverse it now.” Soon the change was reversed and transactions were flowing again. The dashboards cleared and “green” lights came back on. Everyone on the bridge quietly, and sometimes not so silently, celebrated and felt an incredible emotional relief. Sure, there would be more questions, incident review and learning, but solving the problem was exhilarating.

How many of you can relate to a story like that? How many of you have been on that call?

A friend of mine, Dr. Steven Spear at MIT, often reminds us that the key to solving a problem is seeing the problem. You can’t solve what you cannot see. A big part of reliability engineering and systems dynamics is understanding how we gain visibility into problems and surface them so they can be addressed. Ideally, we find those weaknesses before they cause real business impact. That is often the attraction of chaos engineering, poking at fault domains to expose fractures that could become outages. But sometimes the issue is so complex that we just need a clear line of sight into the problem. In the story above, connectivity and those dependent links were not clearly visible. If there was some way to measure the foundational connectivity between the dependent locations, our operational heroes could have quickly seen it, fixed it, and gone back to sleep. Getting that visibility in advanced is the right thing to do for our business, our customers and our teams.

This past weekend, I found myself itching to code and tinker around with some new tech. The story above is one I have seen repeated multiple times. We often have limited visibility into point-to-point connectivity across our networks and vendors. Yet we have this grid of dependency that is needed to deliver our business powering technology. I know what you are thinking. There are millions of tools that do that. I found some and they were very elaborate and complicated, way more than what I wanted to experiment with. I finally had my excuse to code. I wanted to build a system to synthetically monitor all these links. Think of an instance in one datacenter or cloud polling an instance in another datacenter or region. I had a few hours this weekend so I blasted out some code. I created a tiny multithreaded python webservice that polls a list of other nodes and builds a graph database it displays using the JavaScript visualization library, cytoscape ,which was fun learning by itself. Of course, I packed this all into a container and gave it the catchy name, “GridBug”. Yes, I know, I’m a nerd.

You can throw a GridBug onto any instance, into any datacenter, and it will go to work monitoring connectivity. I didn’t have time to test any serverless options but it should work as well. I set up 5 nodes in 3 locations for a test, with some forced failures to see how it would detect conditions on the grid. The graph data converges overtime so that every node can render the same graph. If you want to see it, here is my test and project code: https://github.com/jasonacox/gridbug

I have no expectations on this project. It is clearly just a work of fun I wanted to share with all of you, but it occurs to me that there is still a lesson here. Pain or necessity is a mighty force in terms of inspiration. What bugs you? Like this outage example, is there some pain point that you would love to see addressed? What’s keeping you from trying to fix it? Come up with a project and go to work on it. You are going to learn something! Look, let’s be real, my project here is elementary and buggy at best (sorry, couldn’t resist the pun), but I got a chance to learn something new and see a fun result. That’s what makes projects like this so rewarding. The journey is the point, and frankly, you might even end up with something that brings some value to the rest of our human family. Go create something new this week!

Have a great week!

Investments Unlimited – The Origin Story

We had assembled to put together the outline for a guidance paper. At the top was the title, “Modern Governance.” I thought to myself that the title alone would cure insomnia. Despite the title, members of the team had developed brilliant new automation and approaches. They were already deploying those game changing ideas at their businesses. We wanted to share those! Unfortunately, the gold was buried in the boredom. It was too academic and dry. Nobody would make it past the title, much less the layers of governance tedium in the outline. Energy in the room which had been off the chart during the discovery discussions suddenly fell flat as we all realized that our guidance document would have little impact on the real world.

“Hey, I have an idea! Why don’t we just tell a story?” I suggested, “Imagine a Phoenix Project moment where a crisis hits and a band of characters have to solve it.” Enthusiasm erupted as the group piled on with ideas on how the story could unfold to show and teach the thoughts we had captured in the dry outline. Suddenly, characters emerged. Susan, the CEO was getting an urgent phone call about an existential crisis hitting her company. Bill, Jada, Michelle, Jason and the rest of the cast of character sprung to life in a brief narrative. We put the story to paper and changed the name to Investments Unlimited, inspired by the fictitious company in the Phoenix Project. We had done it! A short story was assembled and we presented it to the rest of the DevOps Forum who applauded the work. Mission accomplished. Or so it seemed…

A few months later we were invited to a meeting. “Gene Kim and the staff at IT Revolution reviewed your paper and we have a proposal.” Leah, the editor for IT Rev and the Forum papers explained to us, “We think the paper is great, but we think it could be greater. We would like to turn it into a novel.” She paused and surveyed the group. John Willis, the leader of the forum group and fellow co-author, suggested, “I think we should do this! It would take some work, but we should write it ourselves and add some of the details that we couldn’t develop before. What do you think, are you up for it?” We were all stunned and delighted. One by one, we all chimed in that we would love to take on the challenge. Shortly after that call we started meeting every Tuesday evening to work on the book.  We invited industry experts to interview and fill in the gaps of our understanding. Weekends became a writing club where some of us would meet to knock out a scene, develop a character or wordsmith a moment. Slowly the short paper became chapters, and the chapters became a novel.

I confess, I was enamored just to be part of this great group of co-authors. This cast was made up of an incredible family of industry thought leaders, technical gurus and fellow DevOps rebels: Helen Beal, Bill Bensing, Michael Edenzon, Tapabrata “Topo” Pal, Caleb Queern, John Rzeszotarski, Andres Vega and of course, John Willis. Our meetings would sometimes pivot into philosophical discussions, technology news or current DevSecOps challenges. Despite the frequent distractions and detours, we managed to nudge the narrative forward, week by week.

Writing a book is hard. You are turning ambiguous ideas into letters on a page. The key was to just keep writing, keep the prose flowing. There were times where you wouldn’t feel inspired or enthusiastic about the words pouring out of your fingers, but you would keep typing. I was surprised and amazed at how well that worked. More than once, I discovered that inspiration followed effort. The act of doing created a warming glow. Suddenly the arduous task unlocked a love, a passion and an inspiration that wasn’t there before. That approach developed new twists in the story, new ideas to explore or challenges to solve. But getting those words on the paper were important. We would spend months editing and tweaking the story, but without that original content there would be nothing to work with. Eventually we would have a finished product and as of two weeks ago, a published book. It was an experience that I will forever cherish and recommend to anyone who gets the opportunity to do the same.

Just keep writing. Going through this journey has reminded me of the importance of “doing,” self-motivation and determination. I think we can all get stuck in limbo, waiting around for that magical moment of inspiration. The truth is that in life, that inspiration is often the result of the wind of our own movements. Just keep going! Inspiration will come. Words will become chapters and chapters will become stories. What are you penning today? What adventures are you crafting by your doing? Get up, get moving… keep writing.


Find out more about Investments Unlimited here.

Investments Unlimited
A Novel About DevOps, Security, Audit Compliance, and Thriving in the Digital Age
by Helen Beal, Bill Bensing, Jason Cox, Michael Edenzon, Dr. Tapabrata “Topo” Pal, Caleb Queern, John Rzeszotarski, Andres Vega, and John Willis

See Problems

“Failure isn’t fatal, but failure to change might be.” – John Wooden

On June 4, 1942, the Japanese Navy arrived at the island of Midway to battle the United States Navy.  They had twice the number of pilots, planes and firepower.  Clearly with this much difference between these two forces, the Japanese should have won.  But that’s not what happened. In a surprising turn of events that became lore and even feature length movies, the United States won the Battle of Midway and that catastrophic defeat devastated the Japanese Navy.  It led to their inability to wage war in the Pacific for the remainder of the Second World War. 

Dr. Steven Spear from MIT, tells this story and asks the question, “When do you suppose the Japanese lost the Battle of Midway?”  Books have been written on the exact details of each maneuver during the battle to try to determine the exact moment that spelled the loss for Japan.  Prepare for a surprise.  The Battle of Midway was lost in 1929, not 1942.  Over a decade before!  Here’s the deal, by 1929 the Japanese Admiralty had locked in their assumption on how wars would be fought and won on the sea.  Everything was built upon the assumption that the entire fleet of one nation would face the entire fleet of the other head on. That doctrine dictated how they designed their aircraft, their carriers, their procedures and tactics.  They scripted the entire battle plan for Midway and conducted war games to rehearse it.

During the war game they set up a huge table with the layout of the two sides.  The Japanese Admirals sat on one side of the table and brought in junior officers to play the side of the US.  Both sides used sticks to push the wooden ships around the map.  After a few back and forth moves, a referee blew a whistle and accused the junior officer of not playing according to the battle plan.  He was kicked out and another junior officer was recruited.  This officer did the same thing as the former one, he looked down the table, realizing he was significantly outmatched, he too deviated from the battle plan and began to win against the Japanese side.  Once again, he was accused of not understanding the battle plan and dismissed.  This same thing happened until they went through all the junior officers, then petty officers and even brought in noodle vendors off the street.  Each time the US side won and the Japanese Admirals were frustrated that nobody was playing by the battle plan.  Instead of seeing the problem that these exercises were showing, they fixated on pathologically rehearsing their failed plan.  That is how the battle was lost.

The lesson here is powerful.  We often believe we know the best way to solve problems.  We can go to great lengths and details in defining and prescribing the solution.  But if the solution is not tested or we are unwilling to observe and build ways to see problems and learn, we can suffer catastrophic failures similar to the Japanese Admiralty.  We should design and test our systems in such a way that we can clearly detect problems, learn from them and alter course when discovery is made.  Avoiding that is effectively setting a course for failure.

A growth mindset seizes upon unexpected events or failures as golden moments of learning.  I believe this applies to all of life, not just our engineering efforts.  We all make plans, sometimes elaborate plans, and yet how do we react when those plans are thwarted?  Do we dismiss the opponent and try to get back to plan, or do we learn and alter our course?  I know this is a growth area for me.  I often want to push ahead with full force to get something done.  This pandemic has thwarted and change a lot of our plans.  But do we surrender or do we embrace the discovery as new opportunity?  Don’t become discouraged, fatigued or apathetic.  Convert problems into energy and redirect it towards a positive direction.  The secret power of successful businesses, teams and individuals is the ability to quickly learn and adjust to discovery. 

Are you struggling with your own plan failures?  Don’t give up!  Look at those failures as opportunities for learning and adopt the change.  The battle is not lost.  Glean the learning and become better.  You can do this.  Keep learning!

The Unicorn Project

The Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Digital, by Gene Kim

This new novel by Gene Kim takes place simultaneously as the events of The Phoenix Project with many of the same characters, business challenges and end results. All of this continues to take place within the fictional company, Parts Unlimited. However, while the prior book gave us insight into the transformation of the operations team, this book chronicles the journey from a developers point of view.

The Unicorn Project takes you on a fun and inspiring journey into some of the most difficult IT and business challenges we face today.  The project may be mythical, but the lessons and ideals uncovered here provide real help and inspiration to any leader seeking to transform their business. Along the way, you discover people empowering, data driven and digital business enabling ways of working that can unleash the powerful potential within any organization. 

There are five ideals that are discovered through the course of the book that will help any business succeed.

  • The First Ideal – Locality and Simplicity
  • The Second Ideal – Focus, Flow, and Joy
  • The Third Ideal – Improvement of Daily Work
  • The Fourth Ideal – Psychological Safety
  • The Fifth Ideal – Focus on our Customer

I recommend this book for any business or technology student, professional or leader who is serious about leveraging data driven digital disruption and workforce empowerment to delivering business value faster, better, safer and happier.

More information about the Unicorn Project.

DevOps Enterprise Summit – London 2017

#DOES17

I had the privilege of attending and speaking again at this years DevOps Enterprise Summit in London at Queen Elizabeth II Centre across the street from Westminster Abby and Big Ben. The conference was attended by nearly 700 transformative leaders from companies and organization across the UK and the rest of Europe:  Hiscox, ITV, Barclays, Hearst, Jaguar Land Rover, Lloyds Banking Group, Orange, Northrop Grumman, easyJet, Capital One, UK Ministry of Justice, ING, Swisscom, Lockheed Martin and more.

The speakers’ slide decks and videos of their talks are available now!

Great talk by Chris Hill (Jaguar Land Rover) and a great quote…

I love this quote from Suzette Johnson (Northrop Grumman) – an example of a good leader empowering the team:

Jonathan Smart (Barclays) had several great points, including this courageous quote on challenges along the journey:

I love this quote from Jonathan Fletcher’s (Hiscox) talk:

Creating Digital Magic

I was honored to speak again and talk about our DevOps journey at Disney.

Even though I wasn’t able to record my presentation, TheNewStack provided a great write-up of my talk: https://thenewstack.io/magic-behind-disney-devops-experience/

Ask the Speakers

Great “Ask the Speakers” session with my new friends Jonathan Smart (Barclays) and Andrea Hirzle-Yager (Allianz Deutchland AG):

And…

The best part of this years trip to London?  Yes, an amazing journey through time and space with my sweetie…

A Seat at the Table: IT Leadership in the Age of Agility

Book Review

A Seat at the Table: IT Leadership in the Age of Agility
by Mark Schwartz

This should be required reading for all technology and business leaders who are serious about digital transformation.  This book takes you on a provocative, fun and comprehensive tour of the key areas that will promote and ignite digital empowering agility, creativity, learning, community and collaboration.

This book may be about taking a seat, but this is no time to be sitting still!   IT leaders will be convinced that their job is now about incentivizing and inspiring courage, passion and technical excellence in service of business objectives rather than blindly servicing requirements. You will even find practical advice on how to deal with projects, scope creep, IT assets (what the author calls Enterprise Architecture), governance, security, risk management, quality, and shadow IT.

DevOps Enterprise Summit 2016 – San Francisco

The 2016 edition of the San Francisco based DevOps Enterprise Summit underscored the momentum and scale of the DevOps movement across the industry.  The summit saw record level attendance and phenomenal presentations from established DevOps luminaries, notable DevOps transformational companies as well as many new companies.

“We are at our best when we are helping each other, serving each other, and making a positive difference” – Jason Cox, Disney

Articles related to DOES 16:

https://blog.chef.io/2016/11/21/chef-devops-enterprise-summit/

DevOps Chat: Gene Kim on The DevOps Handbook and DevOps Enterprise Summit

Innovation at Dimension Data: Taking DevOps Beyond Deployment

Innovation at Dimension Data: Accelerating Innovation and Digital Transformation with StackStorm Event Driven Automation

 

 

 

 

 

 

Thinking Environments

Transformational technology leaders from many companies across the world assembled at the 2016 DevOps Enterprise Forum to discuss DevOps practices, challenges and best-known methods to help our organizations and our community succeed.

Along with several other leaders, I had the privilege of helping put together a guidance document on DevOps Organizational Models to accelerate business and empower workers. In this free publication by IT Revolution, we take a look at how and why organizations are structured, examine which have characteristics that promote or impede business enabling DevOps practices, and take a deep dive into four different models that began to surface during our research:  (1) the traditional functional silo hierarchy, (2) the matrix model, (3) the product platform model and the (X) adaptive organization model.

Download PDF Here

Authors

  • Mark Schwartz, CIO, US Citizenship and Immigration Services
  • Jason Cox, Director, Systems Engineering, The Walt Disney Company
  • Jonathan Snyder, Sr. Manager, Service Deployment & Quality, Adobe Systems
  • Mark Rendell, Principal Director, Accenture
  • Chivas Nambiar, Director Systems Engineering, Verizon
  • Mustafa Kapadia, NA DevOps Service Line Leader, IBM

More DevOps guidance documents can be found here: http://itrevolution.com/devops_enterprise_forum_guidance

DevOps Handbook

devopshandbookDevOps Handbook: 
How to Create World-Class Agility, Reliability, & Security in Technology Organizations

These notable DevOps luminaries provide a comprehensive definition, patterns and guidance on implementing business winning DevOps culture and practices within your your organization.  Beyond just looking at successful DevOps principles from “unicorn” companies like Google, Amazon, Facebook, Etsy, and Netflix, the authors provide several practical examples and case studies where these same practices are helping traditional enterprise companies like Target, Nordstrom, Raytheon, Nationwide Insurance, CSG, Capital One, and Disney.

The handbook captures several quotes from industry practitioners as well as unpack patterns that help promote increased velocity, feedback and experimentation and learning.

Citations from The DevOps Handbook

wordcloud-devops-handbook

DevOps Enterprise Summit – London 2016

I once again had the privilege of attending the DevOps Enterprise Summit.  This time it was in the U.K. at the Hilton Metropole.  I was impressed with the representation and talks from companies and organization across the UK and the rest of Europe:  SAP, ITV, Hiscox, ING, Barclays, HMRC, Zurich, and many more.

Themes that I picked up from these DevOps leaders:

  • People – Its all about People – empathy, org change, transformation
  • Speed – Continuous Integration and Delivery
  • Quality – Investment in DevOps practices often results in higher quality output
  • Agility – Microservices and Flexible Infrastructure
  • Security – Everyone’s responsibility
  • Business – Focus on Product vs. Project with integration with business in transformation (BizDevOps?)

I was honored to speak again and talk about our DevOps journey at Disney.

Jason Cox DOES16 London

Even though I wasn’t able to record my presentation, ComputerWorld UK provided a great write-up of my talk, and even gave me a new title! 🙂

There was considerable interest in our journey to DevOps, especially our transition from Operation Specialist to embedded Systems Engineers.

Other Quotes

“If technology is done well it looks like magic”

References

Systems strategy chief Jason Cox details Disney’s devops journey – ComputerWorld UK

Tips for DevOps Success from DOES 2016 – ComputerWorld UK

DevOps Across the Pond – London Reprise – ITproPortal

Overcoming the scale-up challenge of enterprise DevOps adoption – ComputerWeekly.com