The Best Pottery

It was the first day of the pottery class. The instructor welcomed the students and began to orient them on the material. He announced that the final grade would be determined by one of two measures. For half the class, he said that their final grade would be determined by the “quality” of their pottery. Their goal was to work on a single high quality product. For the other half of the class, he said that their final grade would be determined by “quantity”. Their goal was the sheer amount of pottery produced. Fifty pounds of pots would be rated an “A”, forty pounds a “B”, and so on. The class began and the students began their work.

The last day of class finally came and a curious fact emerged. The works of highest quality were not produced by the group focused on quality. Instead, the highest quality works were all produced by the group graded for quantity! It seemed that the “quantity” group got busy producing piles of work and learning from their mistakes as they went along. In contrast, the “quality” group sat around theorizing about perfection, and in the end had little to show for their work than some theory of perfection and a lump of dead clay.[1]

The key to becoming a great artist, writer, musician, etc., is to keep creating! Keep drawing, keep writing, keep playing! Quality emerges from the quantity. It strikes me that the same thing applies to software and systems we run. When we focus purely on the quality, we actually miss the mark. The way to improve quality is to keep creating, testing and learning. In the software sense, we want to keep releasing our code as often and as fast as possible. By doing that, we build operational expertise, knowledge and automation. We develop fast feedback loops that nudge the digital clay into a better shape. We tune processes to provide faster feedback loops, remove toil through automation, and minimize human error and mistakes. We optimize for a high throughput of working products and reap the prize of high quality outcomes.

But does this hold true? In my career, I have seen this to be true time and time again. Areas where we remove friction and optimize for faster release cycles (even multiple times a day), with automated integration, testing and delivery, ultimately result in higher quality products. I see the same thing looking out to the industry. The highest performing teams optimize for highest flow. The prize of perfection comes by delivering and learning. In the book, “Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations,” Dr. Nicole Forsgren, Jez Humble, and Gene Kim ran a multi-year research project looking at practices and capabilities of high-performing technology organizations. Their conclusion was that the highest performing organizations embraced the notion of continuous delivery, the ability to deliver changes frequently, reliably and with minimal manual effort.[2]

We ship! As technologist, software engineers and SREs, our teams help design, build and run the digital trains that deliver amazing products and experiences to our customers and fellow employees every single day. Our goal is to make these experiences shine! And, as the pottery class learned, it is quantity of our practice and continuous learning that makes them more perfect.

Keep shipping. Keep improving. Keep delivering!


References

  1. The pottery parable is a true story as captured by David Bayles and Ted Orland in their book, Art & Fear. There is a similar story about photography in James Clear’s book Atomic Habits.
  2. Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations by Dr. Nicole Forsgren, Jez Humble, and Gene Kim also identifies other key traits of high performing organizations, including having loosely coupled architecture, embracing a learning culture of experimentation, adopting lean principles to optimize flow, and creating a high-trust and empowering environment.

  • Forsgren, N., Humble, J., & Kim, G. (2018). Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations. IT Revolution Press.
  • Bayles, D., & Orland, T. (1993). Art & Fear. The Image Continuum.
  • Clear, J. (2018). Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones. Avery.

The Art of Removal

“The sculpture is already complete within the marble block before I start my work. It is already there, I just have to chisel away the superfluous material.” – Michelangelo

A tanker truck hauling 8,600 gallons of gasoline approached the MacArthur Maze, a large freeway interchange near the east end of the San Francisco, Oakland Bay Bridge in California. The driver, traveling faster than he should, lost control, hit the guardrail and overturned the load of highly flammable fuel. It spilled out on the interchange and exploded into a violent inferno, sending flames hundreds of feet into the air. The heat weakened the steel structure of the three-lane section of Interstate 580, causing the road to collapse onto Interstate 880 below. Thankfully, the driver survived and no other vehicles were involved in the accident. 

California Department of Transportation, Caltrans, rushed in to quickly assessed the damage of this crucial interchange which handles some 160,000 vehicles per day. It would take weeks to clear the debris and several months to repair. Initial cost projections reached $10 million with an impact cost of $90 million. Bidding for the job started immediately. Due to the urgency of restoring this vital link, the state offered an incentive of $200,000 per day bonus if the work was completed before the deadline.

Bidding started. C. C. Myers had been planning for this his whole life. While other contractors in the room were offering on-time proposals well over the $10 million estimate, C. C. Myers shocked the room. He would do the work for $878,075, promising to complete the work well ahead of schedule. This was not the first time C. C. Myers had taken on heroic work. His company had a proven track record of rebuilding damage freeways well ahead of schedule, including the Santa Monica Freeway after the 1994 Northridge earthquake. Needless to say, he won the bid.

C. C. Myers went to work. He had assembled a logistic transport team and forged agreements in Texas and other areas to expedite steel delivery to the interchange. He streamlined processes and cut away any distractions and superfluous procedures that didn’t directly contribute to safely delivering the roadway ahead of schedule. As an example, the typical inspection process requires steel workers to complete all their welds before scheduling government X-ray inspection. C. C. Myers convinced the government to embed X-ray technicians in his team and perform the test immediately after the weld was complete. This allowed the crew to get real-time feedback on any area that didn’t pass and fix it immediately before moving on. 

C. C. Myers’s efforts were successful. The monumental work was completed over a month ahead of schedule, right before a busy Memorial Day weekend. C. C. Myers earned a $5 million bonus for completing the work early. He quickly gave credit to his workers and their ability to deliver, but moving the mountain had required his artistry as well.

Like Michelangelo, C. C. Myers’s genius was his ability to stare into the mountain of “marble” and see what could be removed to reveal the ultimate outcome. Procedures and processes that didn’t directly deliver value were debris that had to be swept away. Every ounce of energy, every minute, and every movement was precious and deliberate. Everything that wasn’t part of the goal was chiseled away. 

What is the work and marble before you right now? What is the goal? What sculpture are you trying to reveal? What can you remove? As all you wonderful artists head into your work channel your inner Michelangelo. Chisel away the useless motion, process and procedures to reveal the incredible work of art buried in the marble.


Credit: A friend of mine, Paul Gaffney, spoke on this at the 2023 DevOps Enterprise Forum. His story was far more eloquent than my version. It motivated me to do more research on the incident. The result is this post. I’m indebted to Paul for his inspiration.

Grid Bugs

Oh, no! We were several hours into a major system outage and there was still no clue as to what was broken. The webservers were running at full load and the applications were pumping a constant stream of error logs to disk. Systems and application engineers were frantically looking through the dizzying logs for clues as to the cause. Of course, looking at the logs, you would assume everything was broken, and it was. But even when the application worked, the logs were full of indecipherable errors. Everyone knew that most of the “errors” in the logs weren’t really errors, but untidy notices that developers had created long ago as part of a debugging exercise. As one engineer observed in some degree of frustration, “It’s like the log file that cried wolf!” After a while, nobody notices the errors.

The teams restarted services, rebooted systems, stopped and restarted load balancers. Nothing helped. Network engineers dug into the configuration of the routers and switches to make sure nothing was amiss. Except for the occasional keyboard typing sounds, dogs barking or children crying in the background, the intense investigation had produced an uncanny silence on the call. Operation center specialists were quickly crafting their communication updates and were discussing with the incident commander on how to update their many clients that were impacted by this outage. Company leaders and members of the board of directors were calling in to get updates. Stress was high. Would we ever find the cause or should we just shut down the company now and start over? Fatigue was setting in. Tempers were starting to show. Discussion ensued on the conference call to explore all mitigation options and next steps.

“I found it!” The discussion on the call stopped. Everyone perked up, anxious to hear the discovery. “What did you find?” the commander asked in a hopeful way. The giddy engineer took center stage on the call, eager to tell the news. “It’s the inventory service! The server at the fulfillment center seems to be intermittently timing out. Transactions are getting stuck in the queue.” The engineer paused, clearly typing away at some commands on his computer. “I think we have a routing problem. I try to trace it but it seems to bounce around and disappear. Sometimes it works, but to complete the transaction, multiple calls are required and too many of them are failing. I’m chatting with the fulfillment center and they report the inventory system is running.”

The engineer sent the traceroute to the network engineer who started investigating and then asked, “Can you send me the list of all the addresses used by the inventory system?”  After some back and forth, the conclusion came, “I found the problem! There are two paths to the fulfillment center, one of which goes through another datacenter. That datacenter link looks up but it is clearly not passing traffic.” After more typing, the conclusion, “Ah, it seems the telco made a routing change. I’m getting them to reverse it now.” Soon the change was reversed and transactions were flowing again. The dashboards cleared and “green” lights came back on. Everyone on the bridge quietly, and sometimes not so silently, celebrated and felt an incredible emotional relief. Sure, there would be more questions, incident review and learning, but solving the problem was exhilarating.

How many of you can relate to a story like that? How many of you have been on that call?

A friend of mine, Dr. Steven Spear at MIT, often reminds us that the key to solving a problem is seeing the problem. You can’t solve what you cannot see. A big part of reliability engineering and systems dynamics is understanding how we gain visibility into problems and surface them so they can be addressed. Ideally, we find those weaknesses before they cause real business impact. That is often the attraction of chaos engineering, poking at fault domains to expose fractures that could become outages. But sometimes the issue is so complex that we just need a clear line of sight into the problem. In the story above, connectivity and those dependent links were not clearly visible. If there was some way to measure the foundational connectivity between the dependent locations, our operational heroes could have quickly seen it, fixed it, and gone back to sleep. Getting that visibility in advanced is the right thing to do for our business, our customers and our teams.

This past weekend, I found myself itching to code and tinker around with some new tech. The story above is one I have seen repeated multiple times. We often have limited visibility into point-to-point connectivity across our networks and vendors. Yet we have this grid of dependency that is needed to deliver our business powering technology. I know what you are thinking. There are millions of tools that do that. I found some and they were very elaborate and complicated, way more than what I wanted to experiment with. I finally had my excuse to code. I wanted to build a system to synthetically monitor all these links. Think of an instance in one datacenter or cloud polling an instance in another datacenter or region. I had a few hours this weekend so I blasted out some code. I created a tiny multithreaded python webservice that polls a list of other nodes and builds a graph database it displays using the JavaScript visualization library, cytoscape ,which was fun learning by itself. Of course, I packed this all into a container and gave it the catchy name, “GridBug”. Yes, I know, I’m a nerd.

You can throw a GridBug onto any instance, into any datacenter, and it will go to work monitoring connectivity. I didn’t have time to test any serverless options but it should work as well. I set up 5 nodes in 3 locations for a test, with some forced failures to see how it would detect conditions on the grid. The graph data converges overtime so that every node can render the same graph. If you want to see it, here is my test and project code: https://github.com/jasonacox/gridbug

I have no expectations on this project. It is clearly just a work of fun I wanted to share with all of you, but it occurs to me that there is still a lesson here. Pain or necessity is a mighty force in terms of inspiration. What bugs you? Like this outage example, is there some pain point that you would love to see addressed? What’s keeping you from trying to fix it? Come up with a project and go to work on it. You are going to learn something! Look, let’s be real, my project here is elementary and buggy at best (sorry, couldn’t resist the pun), but I got a chance to learn something new and see a fun result. That’s what makes projects like this so rewarding. The journey is the point, and frankly, you might even end up with something that brings some value to the rest of our human family. Go create something new this week!

Have a great week!

Investments Unlimited – The Origin Story

We had assembled to put together the outline for a guidance paper. At the top was the title, “Modern Governance.” I thought to myself that the title alone would cure insomnia. Despite the title, members of the team had developed brilliant new automation and approaches. They were already deploying those game changing ideas at their businesses. We wanted to share those! Unfortunately, the gold was buried in the boredom. It was too academic and dry. Nobody would make it past the title, much less the layers of governance tedium in the outline. Energy in the room which had been off the chart during the discovery discussions suddenly fell flat as we all realized that our guidance document would have little impact on the real world.

“Hey, I have an idea! Why don’t we just tell a story?” I suggested, “Imagine a Phoenix Project moment where a crisis hits and a band of characters have to solve it.” Enthusiasm erupted as the group piled on with ideas on how the story could unfold to show and teach the thoughts we had captured in the dry outline. Suddenly, characters emerged. Susan, the CEO was getting an urgent phone call about an existential crisis hitting her company. Bill, Jada, Michelle, Jason and the rest of the cast of character sprung to life in a brief narrative. We put the story to paper and changed the name to Investments Unlimited, inspired by the fictitious company in the Phoenix Project. We had done it! A short story was assembled and we presented it to the rest of the DevOps Forum who applauded the work. Mission accomplished. Or so it seemed…

A few months later we were invited to a meeting. “Gene Kim and the staff at IT Revolution reviewed your paper and we have a proposal.” Leah, the editor for IT Rev and the Forum papers explained to us, “We think the paper is great, but we think it could be greater. We would like to turn it into a novel.” She paused and surveyed the group. John Willis, the leader of the forum group and fellow co-author, suggested, “I think we should do this! It would take some work, but we should write it ourselves and add some of the details that we couldn’t develop before. What do you think, are you up for it?” We were all stunned and delighted. One by one, we all chimed in that we would love to take on the challenge. Shortly after that call we started meeting every Tuesday evening to work on the book.  We invited industry experts to interview and fill in the gaps of our understanding. Weekends became a writing club where some of us would meet to knock out a scene, develop a character or wordsmith a moment. Slowly the short paper became chapters, and the chapters became a novel.

I confess, I was enamored just to be part of this great group of co-authors. This cast was made up of an incredible family of industry thought leaders, technical gurus and fellow DevOps rebels: Helen Beal, Bill Bensing, Michael Edenzon, Tapabrata “Topo” Pal, Caleb Queern, John Rzeszotarski, Andres Vega and of course, John Willis. Our meetings would sometimes pivot into philosophical discussions, technology news or current DevSecOps challenges. Despite the frequent distractions and detours, we managed to nudge the narrative forward, week by week.

Writing a book is hard. You are turning ambiguous ideas into letters on a page. The key was to just keep writing, keep the prose flowing. There were times where you wouldn’t feel inspired or enthusiastic about the words pouring out of your fingers, but you would keep typing. I was surprised and amazed at how well that worked. More than once, I discovered that inspiration followed effort. The act of doing created a warming glow. Suddenly the arduous task unlocked a love, a passion and an inspiration that wasn’t there before. That approach developed new twists in the story, new ideas to explore or challenges to solve. But getting those words on the paper were important. We would spend months editing and tweaking the story, but without that original content there would be nothing to work with. Eventually we would have a finished product and as of two weeks ago, a published book. It was an experience that I will forever cherish and recommend to anyone who gets the opportunity to do the same.

Just keep writing. Going through this journey has reminded me of the importance of “doing,” self-motivation and determination. I think we can all get stuck in limbo, waiting around for that magical moment of inspiration. The truth is that in life, that inspiration is often the result of the wind of our own movements. Just keep going! Inspiration will come. Words will become chapters and chapters will become stories. What are you penning today? What adventures are you crafting by your doing? Get up, get moving… keep writing.


Find out more about Investments Unlimited here.

Investments Unlimited
A Novel About DevOps, Security, Audit Compliance, and Thriving in the Digital Age
by Helen Beal, Bill Bensing, Jason Cox, Michael Edenzon, Dr. Tapabrata “Topo” Pal, Caleb Queern, John Rzeszotarski, Andres Vega, and John Willis

See Problems

“Failure isn’t fatal, but failure to change might be.” – John Wooden

On June 4, 1942, the Japanese Navy arrived at the island of Midway to battle the United States Navy.  They had twice the number of pilots, planes and firepower.  Clearly with this much difference between these two forces, the Japanese should have won.  But that’s not what happened. In a surprising turn of events that became lore and even feature length movies, the United States won the Battle of Midway and that catastrophic defeat devastated the Japanese Navy.  It led to their inability to wage war in the Pacific for the remainder of the Second World War. 

Dr. Steven Spear from MIT, tells this story and asks the question, “When do you suppose the Japanese lost the Battle of Midway?”  Books have been written on the exact details of each maneuver during the battle to try to determine the exact moment that spelled the loss for Japan.  Prepare for a surprise.  The Battle of Midway was lost in 1929, not 1942.  Over a decade before!  Here’s the deal, by 1929 the Japanese Admiralty had locked in their assumption on how wars would be fought and won on the sea.  Everything was built upon the assumption that the entire fleet of one nation would face the entire fleet of the other head on. That doctrine dictated how they designed their aircraft, their carriers, their procedures and tactics.  They scripted the entire battle plan for Midway and conducted war games to rehearse it.

During the war game they set up a huge table with the layout of the two sides.  The Japanese Admirals sat on one side of the table and brought in junior officers to play the side of the US.  Both sides used sticks to push the wooden ships around the map.  After a few back and forth moves, a referee blew a whistle and accused the junior officer of not playing according to the battle plan.  He was kicked out and another junior officer was recruited.  This officer did the same thing as the former one, he looked down the table, realizing he was significantly outmatched, he too deviated from the battle plan and began to win against the Japanese side.  Once again, he was accused of not understanding the battle plan and dismissed.  This same thing happened until they went through all the junior officers, then petty officers and even brought in noodle vendors off the street.  Each time the US side won and the Japanese Admirals were frustrated that nobody was playing by the battle plan.  Instead of seeing the problem that these exercises were showing, they fixated on pathologically rehearsing their failed plan.  That is how the battle was lost.

The lesson here is powerful.  We often believe we know the best way to solve problems.  We can go to great lengths and details in defining and prescribing the solution.  But if the solution is not tested or we are unwilling to observe and build ways to see problems and learn, we can suffer catastrophic failures similar to the Japanese Admiralty.  We should design and test our systems in such a way that we can clearly detect problems, learn from them and alter course when discovery is made.  Avoiding that is effectively setting a course for failure.

A growth mindset seizes upon unexpected events or failures as golden moments of learning.  I believe this applies to all of life, not just our engineering efforts.  We all make plans, sometimes elaborate plans, and yet how do we react when those plans are thwarted?  Do we dismiss the opponent and try to get back to plan, or do we learn and alter our course?  I know this is a growth area for me.  I often want to push ahead with full force to get something done.  This pandemic has thwarted and change a lot of our plans.  But do we surrender or do we embrace the discovery as new opportunity?  Don’t become discouraged, fatigued or apathetic.  Convert problems into energy and redirect it towards a positive direction.  The secret power of successful businesses, teams and individuals is the ability to quickly learn and adjust to discovery. 

Are you struggling with your own plan failures?  Don’t give up!  Look at those failures as opportunities for learning and adopt the change.  The battle is not lost.  Glean the learning and become better.  You can do this.  Keep learning!

The Unicorn Project

The Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Digital, by Gene Kim

This new novel by Gene Kim takes place simultaneously as the events of The Phoenix Project with many of the same characters, business challenges and end results. All of this continues to take place within the fictional company, Parts Unlimited. However, while the prior book gave us insight into the transformation of the operations team, this book chronicles the journey from a developers point of view.

The Unicorn Project takes you on a fun and inspiring journey into some of the most difficult IT and business challenges we face today.  The project may be mythical, but the lessons and ideals uncovered here provide real help and inspiration to any leader seeking to transform their business. Along the way, you discover people empowering, data driven and digital business enabling ways of working that can unleash the powerful potential within any organization. 

There are five ideals that are discovered through the course of the book that will help any business succeed.

  • The First Ideal – Locality and Simplicity
  • The Second Ideal – Focus, Flow, and Joy
  • The Third Ideal – Improvement of Daily Work
  • The Fourth Ideal – Psychological Safety
  • The Fifth Ideal – Focus on our Customer

I recommend this book for any business or technology student, professional or leader who is serious about leveraging data driven digital disruption and workforce empowerment to delivering business value faster, better, safer and happier.

More information about the Unicorn Project.

DevOps Enterprise Summit – London 2017

#DOES17

I had the privilege of attending and speaking again at this years DevOps Enterprise Summit in London at Queen Elizabeth II Centre across the street from Westminster Abby and Big Ben. The conference was attended by nearly 700 transformative leaders from companies and organization across the UK and the rest of Europe:  Hiscox, ITV, Barclays, Hearst, Jaguar Land Rover, Lloyds Banking Group, Orange, Northrop Grumman, easyJet, Capital One, UK Ministry of Justice, ING, Swisscom, Lockheed Martin and more.

The speakers’ slide decks and videos of their talks are available now!

Great talk by Chris Hill (Jaguar Land Rover) and a great quote…

I love this quote from Suzette Johnson (Northrop Grumman) – an example of a good leader empowering the team:

Jonathan Smart (Barclays) had several great points, including this courageous quote on challenges along the journey:

I love this quote from Jonathan Fletcher’s (Hiscox) talk:

Creating Digital Magic

I was honored to speak again and talk about our DevOps journey at Disney.

Even though I wasn’t able to record my presentation, TheNewStack provided a great write-up of my talk: https://thenewstack.io/magic-behind-disney-devops-experience/

Ask the Speakers

Great “Ask the Speakers” session with my new friends Jonathan Smart (Barclays) and Andrea Hirzle-Yager (Allianz Deutchland AG):

https://www.srepath.com/inside-disneys-site-reliability-engineering-practice/

And…

The best part of this years trip to London?  Yes, an amazing journey through time and space with my sweetie…

A Seat at the Table: IT Leadership in the Age of Agility

Book Review

A Seat at the Table: IT Leadership in the Age of Agility
by Mark Schwartz

This should be required reading for all technology and business leaders who are serious about digital transformation.  This book takes you on a provocative, fun and comprehensive tour of the key areas that will promote and ignite digital empowering agility, creativity, learning, community and collaboration.

This book may be about taking a seat, but this is no time to be sitting still!   IT leaders will be convinced that their job is now about incentivizing and inspiring courage, passion and technical excellence in service of business objectives rather than blindly servicing requirements. You will even find practical advice on how to deal with projects, scope creep, IT assets (what the author calls Enterprise Architecture), governance, security, risk management, quality, and shadow IT.

DevOps Enterprise Summit 2016 – San Francisco

The 2016 edition of the San Francisco based DevOps Enterprise Summit underscored the momentum and scale of the DevOps movement across the industry.  The summit saw record level attendance and phenomenal presentations from established DevOps luminaries, notable DevOps transformational companies as well as many new companies.

“We are at our best when we are helping each other, serving each other, and making a positive difference” – Jason Cox, Disney

Articles related to DOES 16:

https://blog.chef.io/2016/11/21/chef-devops-enterprise-summit/

DevOps Chat: Gene Kim on The DevOps Handbook and DevOps Enterprise Summit

Innovation at Dimension Data: Taking DevOps Beyond Deployment

Innovation at Dimension Data: Accelerating Innovation and Digital Transformation with StackStorm Event Driven Automation

 

 

 

 

 

 

Thinking Environments

Transformational technology leaders from many companies across the world assembled at the 2016 DevOps Enterprise Forum to discuss DevOps practices, challenges and best-known methods to help our organizations and our community succeed.

Along with several other leaders, I had the privilege of helping put together a guidance document on DevOps Organizational Models to accelerate business and empower workers. In this free publication by IT Revolution, we take a look at how and why organizations are structured, examine which have characteristics that promote or impede business enabling DevOps practices, and take a deep dive into four different models that began to surface during our research:  (1) the traditional functional silo hierarchy, (2) the matrix model, (3) the product platform model and the (X) adaptive organization model.

Download PDF Here

Authors

  • Mark Schwartz, CIO, US Citizenship and Immigration Services
  • Jason Cox, Director, Systems Engineering, The Walt Disney Company
  • Jonathan Snyder, Sr. Manager, Service Deployment & Quality, Adobe Systems
  • Mark Rendell, Principal Director, Accenture
  • Chivas Nambiar, Director Systems Engineering, Verizon
  • Mustafa Kapadia, NA DevOps Service Line Leader, IBM

More DevOps guidance documents can be found here: http://itrevolution.com/devops_enterprise_forum_guidance