Pirates, Pumpkins and Princesses were all over the restaurant. I’m a sucker for babies and cute kids. I couldn’t help myself. I wandered around the restaurant greeting heroes, cowboys, fairies, clowns and vampires. I saw Spider-Man at a table next to us and commented on how “amazing” it was to be in the same restaurant with Spider-Man! He eventually took off his mask as we were leaving and I called attention to the fact that Peter Parker had joined us. His face went as red as his suit and an enormous smile formed on his face. His family was beaming.
I spotted Queen Elsa, and made sure everyone knew that we had royalty in our midst. She stood up in her chair with a humongous smile and took a bashful bow as the crowd gave her applause. I passed by Hidden Leaf ninjas, happy clowns, scary skeletons and ghostly ghouls. My favorite of all was a patch of pumpkins placed carefully in a line of highchairs and orange colored bibs. They were all munching away. Who knew pumpkins loved animal crackers so much?
I love Halloween time! Photo memory books on this date remind me of the times we had the cast of Inside Out and Kingdom Hearts assemble in our home to haunt our neighborhood for candy. I recall Jessie the Cowgirl, Elsa and Anna, Rapunzel, A Storm Trooper with Princes Leia, Wendy, Sora and so many more. I love Halloween! Yes, part of it is the ridiculous amounts of candy we buy and consume. But that isn’t all. I love the fun and fanciful moments as kids get to dream into their favorite characters, embrace the identity of their heroes, wrap themselves up in a wonderful world of imagination and make-believe. It’s magic. They get to be anyone they want to be and they are celebrated.
You can be anything you want to be. It occurs to me that we could use a bit more Halloween inspired imagination throughout the year. During Halloween we celebrate and applaud the creative adventures of our kids and each other. We should do that all the time. We should encourage and rejoice in our abilities to dream, to create, to wrap ourselves in fanciful “what ifs” and precocious “why nots” until we see the world though the magical lens of “what can be’s”.
Trick or treat! Tonight, join me in celebrating what we can be. Dream into the future and the possibilities that await us. Those characters, those adventures, those fanciful explorations can unlock a storehouse of potential that will propel our human story forward.
“People don’t care how much you know until they know how much you care.” – Theodore Roosevelt
Mark Schwartz had just taken on his role as CIO of US Citizenship and Immigration Services (USCIS) when the team hand delivered a huge book of rules to his desk. He had just asked the team why it took months, not days or weeks to build and deliver a simple single page website. “Here is why,” his team said pointing to the tome of regulations and rules. They carefully explained the large number of procedures and approvals required to do something that seemed so simple.
Mark was determined to simplify the process and improve the speed. As he read through the enormous volume of rules, he discovered many legacy controls that no longer applied, yet required motion by the team. Like organizational scar tissue, the rules had been crafted to respond to some incident or fear and then spread to cover a vast domain, even if it was outside of scope. They were outdated, inflexible, irrelevant and at best, ineffective.
“This is bureaucracy!” It needed to be eliminated. He immediately started to tune the processes and eliminate the needless rules. Now, you would expect a cheer from the team impacted by this, right? Surprisingly, that’s not what happened. Instead, it resulted in an uprising! The authors of the rules began to appear in his office to protest. His own team resisted the change. Those rules, despite their pain, had become a comfortable crutch for the team. They depended on them to know they were doing the right thing. Even the team that was harmed by them was defending them. But why?
As he talked to them, he came to realize that they were not trying to block progress, they were trying to protect the country, the organization and the individuals applying for citizenship. Their intentions were good! He listened to them. He asked more questions. They explained their concerns and their motives. He told them how he understood and appreciated them and their efforts. He asked if they could work together to change the rules to be more efficient and relevant, but still address their key concerns and motives. The geometry of the energy in the room changed dramatically. Suddenly they were all behind his efforts to improve the rules. Over the next several months they slimmed down the bureaucratic book of rules to a manageable size and more importantly, unlocked the speed and potential of teams trying to deliver features and new websites for the organization.
We often make assumptions. We have a tendency to cast a reality into place that we invent by ourselves. We can even be guilty of assuming malintent of others when that is not the case. A superpower that awaits every leader who chooses to wield it, is the power of listening. As Mark discovered, the bureaucracy that was created by individuals at USCIS was not intended for evil, but for good. By reaching out to those who crafted the rule book to understand their intent, he unlocked the door to improvement. By listening and understanding, he forged a new reality into place that profoundly changed the dynamics from resistance to revolution. The authors of the previous reality willingly enlisted to rewrite the new one. The results were significant.
Do you want to change the world? If so, reach out to others. Ask Questions. Listen. Like Mark, we may discover a good story that will completely rewrite our understanding and forever change the trajectory of our progress. Go on! Listen.
Oh, no! We were several hours into a major system outage and there was still no clue as to what was broken. The webservers were running at full load and the applications were pumping a constant stream of error logs to disk. Systems and application engineers were frantically looking through the dizzying logs for clues as to the cause. Of course, looking at the logs, you would assume everything was broken, and it was. But even when the application worked, the logs were full of indecipherable errors. Everyone knew that most of the “errors” in the logs weren’t really errors, but untidy notices that developers had created long ago as part of a debugging exercise. As one engineer observed in some degree of frustration, “It’s like the log file that cried wolf!” After a while, nobody notices the errors.
The teams restarted services, rebooted systems, stopped and restarted load balancers. Nothing helped. Network engineers dug into the configuration of the routers and switches to make sure nothing was amiss. Except for the occasional keyboard typing sounds, dogs barking or children crying in the background, the intense investigation had produced an uncanny silence on the call. Operation center specialists were quickly crafting their communication updates and were discussing with the incident commander on how to update their many clients that were impacted by this outage. Company leaders and members of the board of directors were calling in to get updates. Stress was high. Would we ever find the cause or should we just shut down the company now and start over? Fatigue was setting in. Tempers were starting to show. Discussion ensued on the conference call to explore all mitigation options and next steps.
“I found it!” The discussion on the call stopped. Everyone perked up, anxious to hear the discovery. “What did you find?” the commander asked in a hopeful way. The giddy engineer took center stage on the call, eager to tell the news. “It’s the inventory service! The server at the fulfillment center seems to be intermittently timing out. Transactions are getting stuck in the queue.” The engineer paused, clearly typing away at some commands on his computer. “I think we have a routing problem. I try to trace it but it seems to bounce around and disappear. Sometimes it works, but to complete the transaction, multiple calls are required and too many of them are failing. I’m chatting with the fulfillment center and they report the inventory system is running.”
The engineer sent the traceroute to the network engineer who started investigating and then asked, “Can you send me the list of all the addresses used by the inventory system?” After some back and forth, the conclusion came, “I found the problem! There are two paths to the fulfillment center, one of which goes through another datacenter. That datacenter link looks up but it is clearly not passing traffic.” After more typing, the conclusion, “Ah, it seems the telco made a routing change. I’m getting them to reverse it now.” Soon the change was reversed and transactions were flowing again. The dashboards cleared and “green” lights came back on. Everyone on the bridge quietly, and sometimes not so silently, celebrated and felt an incredible emotional relief. Sure, there would be more questions, incident review and learning, but solving the problem was exhilarating.
How many of you can relate to a story like that? How many of you have been on that call?
A friend of mine, Dr. Steven Spear at MIT, often reminds us that the key to solving a problem is seeing the problem. You can’t solve what you cannot see. A big part of reliability engineering and systems dynamics is understanding how we gain visibility into problems and surface them so they can be addressed. Ideally, we find those weaknesses before they cause real business impact. That is often the attraction of chaos engineering, poking at fault domains to expose fractures that could become outages. But sometimes the issue is so complex that we just need a clear line of sight into the problem. In the story above, connectivity and those dependent links were not clearly visible. If there was some way to measure the foundational connectivity between the dependent locations, our operational heroes could have quickly seen it, fixed it, and gone back to sleep. Getting that visibility in advanced is the right thing to do for our business, our customers and our teams.
You can throw a GridBug onto any instance, into any datacenter, and it will go to work monitoring connectivity. I didn’t have time to test any serverless options but it should work as well. I set up 5 nodes in 3 locations for a test, with some forced failures to see how it would detect conditions on the grid. The graph data converges overtime so that every node can render the same graph. If you want to see it, here is my test and project code: https://github.com/jasonacox/gridbug
I have no expectations on this project. It is clearly just a work of fun I wanted to share with all of you, but it occurs to me that there is still a lesson here. Pain or necessity is a mighty force in terms of inspiration. What bugs you? Like this outage example, is there some pain point that you would love to see addressed? What’s keeping you from trying to fix it? Come up with a project and go to work on it. You are going to learn something! Look, let’s be real, my project here is elementary and buggy at best (sorry, couldn’t resist the pun), but I got a chance to learn something new and see a fun result. That’s what makes projects like this so rewarding. The journey is the point, and frankly, you might even end up with something that brings some value to the rest of our human family. Go create something new this week!