Book Review Content Corporate Life Customer Support Licensing Monitoring Observability Operations Partnership Product Management Products Sales Security User Experience

Metrics in Splunk, and Observability

Published by

on

Histogram chart of the percent chance that each of these bars will reach the top of this graph
  • I’ve got some thoughts about Splunk and metrics for observability…
  • The event-first Splunk can now store metrics efficiently. That has potential: 1 dashboard, a single glass of pain.
  • I’m excited to see annotations and mcatalog; I’m hoping it allows resolution of a nasty problem with multi-source metric comparison.
  • Metrics are quantitative. “Your volume has N bytes free”. Good? Bad? Quantitative metrics are almost entirely useless for decisions.
  • (I actually think they are useless. A triggered  metric like “DISK FULL, 10 periods” is an event, not a metric. Splitting hairs.)
  • Decisions from metrics need qualitative context. “Allocate more space now or later?” “How much more?” “What about budget & schedule?”
  • Quantitative data, qualitative context, quantitative decision. If the context is only in humans, then humans need training to use it.
  • “Fellow human, I teach you tool’s contextual framework. It emits X metric, Y units, Z interval. Normal = A-S today. If X>N, runbook!”
  • Encode that into a KPI? Hasn’t improved anything. Still breaks when change means normal is wrong. Human has to know context to fix.
  • Compare many KPIs? Not even feasible without qualitative metrics. “Q: Need more storage?” Looks at 4-tier hybrid hierarchy, “A: ???”
  • In Metrics Store’s catalog, seems that unit size is unknown, but there’s periodicity & granularity? If the source gathered them?
  • Why don’t sources just send context? Tools should compute useful values & compare metrics qualitatively. “Tier 3 is 95% full.”
  • Contextual decisions could be automated. “Usage will exceed capacity during your vacation, I think we should buy more space now.”
  • Data system problems could be seen. “Dashboard expects 15 metrics/period, now getting 3 from 1/6 of probes, & 1 OutOfCheese Error.”
  • Answer to “Why don’t you just” questions is “Why should I”. Splunk can answer that. Where’s CIM for Metrics? Real attributes and KPIs?
  • Determining importance of a metric needs context. “Disk full” is pitifully primitive. A service provider or vendor knows better KPIs.
  • Sure would be nice to have vendor-specific tools for detailed analysis and role-specific tools with Splunk awareness metrics.