EN

KUBICAST 174 - ObservIAbility with Luccas Quadros

AIOpservability is not just a pun: it is the future of monitoring and incident response, sewing together the best of Observability and AI

mansplainer

João Brito

We embark on a humorous and technical journey in episode 174 of Kubicast, titled ObservIAbilidade, to play with the convergence between Observability and Artificial Intelligence (AI). Welcoming expert Luccas Quadros, software developer on the AI and Machine Learning team at Grafana, we explore how these two areas intertwine to elevate monitoring to a new level.

Logs, NLP, and the art of taming data

Right off the bat, we discuss how logs — traditionally messy and written for humans — benefit from Natural Language Processing (NLP). Lucas shared insights from his journey, moving from legal applications to working on formatting and extracting meaning from trillions of textual events. We discovered that, despite being structured, logs require careful pre-processing for LLMs to understand timestamps, IP addresses, and error messages, turning chaos into context.

Evolution from anomaly detection to generative AI

We revisited the concepts of statistical anomaly detection in time series, so widespread two years ago, and moved forward to the application of generative AI. We discussed how language models can generate dynamic dashboards, suggest smart thresholds for alerts, and even propose SLO definitions. The transition from traditional machine learning to agents capable of standing back and reviewing large volumes of metrics redefined what we consider "hype" and brought real usability to SRE.

Intelligent agents and the MCP protocol

The major highlight fell on observability agents, endowed with environment context, capable of read-only access to repositories, runbooks, and dashboards. This is where MCP (Model Context Protocol) comes in: a set of integrations connecting LLMs to external systems — Grafana, DataDog, Elastic, GitHub — allowing our agent to not only read logs but also collect metrics and trace spans in real time.

Security and privacy challenges

Our chat also addressed the inherent concerns of sending confidential data to LLM APIs. Lucas highlighted prompt injection attacks, leakage risks, and the need to balance efficiency and compliance. We debated trade-offs between running models on-premise versus leveraging cloud scale, defining governance best practices.

Perspectives and next steps

We closed with bold predictions: soon, we will enter the era where alerts will arrive already "chewed over," anticipating failures like disk exhaustion or subtle performance degradation. We discussed the critical role of the engineer in designing self-healing protocols and the career acceleration opportunities for those who master ObservIAbilidade.

For those who want to start today:

  • Connect your runbook database and documentation to an LLM as an assistant.

  • Experiment with open-source MCP projects to read Prometheus metrics and Kubernetes logs.

  • Participate in events like KCD Rio de Janeiro to exchange experiences.

  • Access the awesome-mcp repos on GitHub and expand your arsenal.

ObservIAbilidade is not just a pun: it is the future of monitoring and incident response, sewing together the best of Observability and AI.



Participate in our early access program and have a safer environment in moments! https://getup.io/zerocve


🎧 Listen also to Kubicast on Spotify, and share it with all the DevOps crowd you love and who love to break a little thing here and there!

Newsletter Getup.

Atualizações sobre Kubernetes e Software Supply Chain Security todos os meses.

Operating Kubernetes in production for more than 13 years. With Quor, this experience extends to software supply chain security as well.