EN

Kubicast #168 - Kubernetes Troubleshooting with Natália Granato

What do you do besides cry when your cluster catches fire? Wipe away the tears and come listen to Natália help you in this episode!

mansplainer

João Brito

In episode #168 of Kubicast, we welcome Natália Granato, a platform engineering specialist, for a sharp conversation about real DevOps, focusing on troubleshooting, production incidents, and best practices born from chaos — and not from a whitepaper.

Natália shared stories that every infrastructure team will recognize: that phantom bug that only appears in production, the flood of useless alerts that hide the real problem, and the lessons that only emerge after going through the fire. Literally, sometimes.

More than talking about tools, the chat dives into the culture behind a good incident resolution process, where communication and team trust are just as important as any script or dashboard.

Between one joke and another, we discussed:

  • How to build an honest post-mortem culture and without witch hunts

  • The impact of a poorly calibrated observability stack (spoiler: noise is the enemy)

  • When it is not a DNS problem — but you still think it is

  • Tools that help (and those that get in the way)

  • The pressure of keeping critical environments running without losing your sanity

If you've ever gone through a production incident and thought "there's no way this only happens to me", this episode is for you. And if you haven't yet, listen to be better prepared — because it will happen.



🎧 Also listen to Kubicast on Spotify, and share it with that colleague who keeps saying "push to production and we'll see" — maybe they really will see.

Newsletter Getup.

Atualizações sobre Kubernetes e Software Supply Chain Security todos os meses.

Operating Kubernetes in production for more than 13 years. With Quor, this experience extends to software supply chain security as well.