Episode 10: The one with Andy Fleener

Notes:

In this episode, we are joined by Andy Fleener, Platform Operations Manager at SportsEngine, and contributing author to Chaos Engineering: System Resiliency in Practice.

We discuss his chapter on humanistic chaos and how we can apply Chaos Engineering to human systems. This leads to other subjects such as Andy’s affinity for humans and how humans approach problems and his preference for salty nut rolls versus peanut brittle.

[Read More]

Episode 9: The one with Liz Fong-Jones

Notes:

In this episode we are joined with Liz Fong-Jones, developer advocate, labor and ethics organizer, and Site Reliability Engineer. She joins in the conversation to discuss observability versus monitoring and how this is critical for Chaos Engineering. Along the way we discuss unionizing software workers, how internet companies handle traffic, and more.

[Read More]

Episode 8: The one with Mikolaj Pawlikowski

Notes:

In this episode, we are joined with Mikolaj Pawlikowski, creator of Powerful Seal and author of the Chaos Engineering book at Manning. This conversation runs the gamut from open source software to seatbelts to the much beloved Toyota Yaris. There is some practical advice on Chaos Engineering and how to help Kubernetes be more reliable through Chaos Engineering practices.

[Read More]

Episode 7: The one with Olga Hall

Notes:

In this episode, we sit down with Olga Hall, Global Head of Amazon Video Availability and Scale to discuss Chaos Engineering, experimentation, and how to do it at scale. You won’t want to miss this episode as we learn from Olga about availability fairytales and how failure really is inevitable. We give some inside scoop on what is happening Chaos Community Day London.

[Read More]

Episode 6: The one with Peter Alvaro

Notes:

In this episode, we sit down with Dr. Peter Alvaro of Disorderly Labs and discuss chaos, failure, academics, and computer science. The topics range from data provenance to fault injection but hinge on how Peter’s LDFI research is impacting the field of chaos engineering. Also, Casey tries to talk Peter into a way to use his research as a way to pull off a bank heist, and of course, James tries to be the voice of reason.

[Read More]

Episode 5: The one with Crystal Hirschorn

Notes:

In this episode, we experience a royal reception with Crystal Hirschorn, the Director Of Engineering at Snyk, to talk about chaos engineering and how to put those practices to work in organizations of any size. Crystal shares advice with us on how to get VP’s and others on board with chaos engineering, which looks quite different from bringing engineers along on the journey.

Throughout the conversation you will see how chaos engineering has matured from the early days of Chaos Monkey from Netflix and how it is in practice at some of the largest organizations across the globe. Crystal discusses how resilience engineering moves from academia and can be put into practice in real organizations. It helps us understand socio-technical systems and how they come into play when we write software.

[Read More]

Episode 4: The one with John Allspaw

Notes:

In this episode, we are joined by John Allspaw the Co-Founder of Adaptive Capacity Labs. John instructs us on how to deal with the bad apples in our organizations, and debates whether or not we can find root cause and human error—whether the person is embodied or not. If you are interested in accident investigations, uncovering truth in complex systems, or using heuristics and correlation to find meaning, then this is the episode for you. Special appearances include Dr. Richard Cook and John’s sister, Sue Allspaw, asking hard-hitting questions that leave John speechless.

[Read More]

Episode 3: The one with Julie Gunderson

Notes:

In this episode Julie explains how communication fits in your DevOps (and Chaos Engineering) efforts and she dissects the misuse of ITIL and root cause thinking as challenges to building a learning organization. After hearing her talk The Psychology of Chaos Engineering at DevOpsDays Raleigh, we knew she had to come on the show. Julie explains how to approach digital and cloud transformations as well as delivers some hilarious one-liners, making this a show to remember.

[Read More]

Episode 2: The one with Russ Miles

Notes:

Russ Miles is a prolific writer and speaker on Chaos Engineering, especially well-known as an evangelist of the discipline in his home town of London. James and Casey discuss his chapter in Chaos Engineering: System Resiliency in Practice which is titled, “Open Minds, Open Science, and Open Chaos.” And as if that isn’t a mouthful, we discuss his previous published works and his Chaos Engineering-themed tour of the United States.

[Read More]

Episode 1: The one with Nora Jones

Notes:

Our special guest Nora Jones and our host Casey Rosenthal are the two authors of O’Reilly’s Chaos Engineering: System Resiliency in Practice, the definitive book on the subject. We ask Nora about learning from incidents, how to facilitate a healthy Chaos Engineering program, the path that led to her writing this book with Casey, how her thinking has changed over the years, and her startup Jeli.

[Read More]