Tag: Observability
-

Observing System Failures
The system is down, and half a dozen teams join a Zoom call that stretches over a couple of days and a Slack channel that lasts for weeks. Vendors and integration partners are bridged in and out, executives pop by to see what’s going on, and the hours pass with…
-

Book Review: Learning OpenTelemetry
Slim volume packed with good stuff, I enjoyed this book. Notably, Observability is defined more practically as the practice of knowing what’s happening instead of the black box outputs definition. There’s a good strong call out against MELT-style thinking. This is the idea that there’s three (or four, or N)…
-

Why is Getting Data In hard?
Maybe first we should ask why people say it’s hard. After all, this shiny modern world is full of one-liners to install agents, hook in libraries, listen to your provider’s pub-sub, or just post stuff at an endpoint. It’s never been easier to get data, and it’s not like writing…
-

AI for Monitoring
Cognitive computing approaches to the monitoring problem haven’t worked in the past and still don’t work now. The future might still make it work, but it’s unlikely to be because of a change in AI technology unless that change is in the per-process execution economics. For AI to be a…
-

Tell, or Do?
A product finds a thing, a situation, a problem. The queue is full! There’s a lot of high severity vulnerabilities! The message handler stopped handling messages! Now the product developers have a choice to make: should they do something, or tell someone? Unfortunately not everyone wants the same decision at…
-

Getting a Boost from Cognitive Computing Tools
A while back in the Before Times, I was working on product market fit for a small cognitive computing company. Plans didn’t work out, but here’s some free PDF materials I’d posted to the public site. First, here’s a talk about how Cognitive Computing differs from the Machine Learning baseline…
-

Shewhart Control Charts
As a monitor writer, I want to alert when a value has changed quickly a lot in one direction or another, but i don’t want to set hard-coded thresholds because the value’s range is expected to slowly evolve. My goal is to get useful alerts and avoid false alarms. Examples:…
-

Heisenberg’s World of Uncertainty
Security analysts can’t ever be certain of what they’re seeing and not seeing. See something, do something My entire career has been in some form of “see what’s important, then do something about it.” It’s Heisenberg’s world though. Collecting and moving data has impact and cost, which can be hard…
-

Metrics and Observability
I wrote this as a Twitter thread in March of 2018, but the character constraints of Twitter at that time made it extremely cryptic. Also, it’s staged as a response to Splunk’s introduction of the metrics index… and to be honest, that’s no longer interesting to me. This is an…
-

Security Products, Rules, and Complexity
Security products need to detect known knowns, so they build up a corpus of rule content. This corpus grows faster than it shrinks, if it’s maintained at all: new known bad is found at a rapid clip, while software is retired from use very slowly. There are two constraints on…
