Tools and the Analyst

Let’s say I’m responsible for a complex system. I might have a lot of titles, but for a big part of my job I’m an analyst of that system. I need tools to help me see into it and change its behavior. As an analyst with a tool, I have some generic use cases the tool needs to meet.

  • Tell me how things are right now
    • What is the state?
    • Is it changing?
  • Tell me how things have been over time?
    • What is the state?
    • Is there any change in progress?
    • Is the state normal?
    • Is the state good/bad/indifferent/unknown?
  • Tell me what I’m supposed to know
    • What is important?
    • What should I mitigate?
    • What can I ignore?
  • Alert me when something needs me
    • What is the problem?
    • What is the impact?
    • Are there any suggested actions?
  • How much can I trust this tool?
    • Do I see outside context changes reflected in it?
    • How does the information it gives me compare with what I see in other tools?
  • How much can I share this tool?
    • Do I understand it well enough to teach it?
    • Can I defend it?

As a generic set of use cases, this is equivalent to the old sysadmin joke, “go away or I will replace you with a small shell script”. A tool that can provide that level of judgement is also capable of doing the analyst’s job. So a lot of tools stop well short of that lofty goal and let the user fill in a great deal when an alert occurs.

  • Alert me when a condition is met
  • Tell me how things are right now
  • Tell me how things have been over time?

Maybe the analyst can tie the tool’s output to something else that tries to fill in more meaningful answers, or maybe they just do all of that in their own heads. This is fine at the early adopter end of Geoffrey Moore’s chasm, and many vendors will stare blankly at you if you ask for more.

After all, their customers are using it now! And besides, how could they add intelligence, they don’t know how you want to use their tool? They don’t know your system or your needs, and they haven’t needed to because early adopters are willing to do the work. But that situation can’t be depended on, because the relationships between customers, vendors, tools, analysts, and systems are not stable.

The system will change, the people involved will change, the customer’s goals will change, and the analyst’s opinion of what’s useful to do and good in a tool won’t stay the same. Even if everything else stays stable, experienced analysts move on to new problems and are replaced by new folks who need to learn.

The result is that tools mature and their user communities shift, growing into mainstream adopters and becoming the norm instead of an outlier. By the time your tool is being introduced to late adopters, it needs to be able to teach a green analyst how to do the job at hand.

How’s that going to work? Here’s a few ideas:

0: ignore the problem. There’s always a cost:benefit analysis to doing something, and nature abhors a vacuum. If a vendor does nothing, perhaps the customer will find it cost-effective to solve the problem instead.
Look at open source software packages aimed into narrow user communities, such as email transfer. Learning to use the tools is a rite of passage to doing the job. Ignoring the problem only works in that case because of email-hosting services though. Because email is generally handled by a 3rd party today, the normal organization doesn’t need to learn Postfix or Sendmail or Qmail. The pool of organizations looking at open source mail transfer agents is self-selected to shops that don’t want to work with a commercial vendor and can take the time to learn the tools.

1: ship with best practices. So as a rule, if the product is aimed at a larger user community, ignoring the problem won’t work well. Another approach is to build in content and features that establish expected norms, like the spelling and grammar checkers in modern office suites. An advanced user will chafe and may turn these features off, but the built-in and automated nature has potential to improve outcomes across the board. That potential is not always realized though, as users can still ignore the tool’s advice. Theoretically that should be less common in higher stakes use cases: an outcome of embarrassing typos is one thing, but an outcome of service outage is another. Still, it’s a risk that the user might find the advice to be too extreme or too rare or otherwise imperfect. Since there is risk of displeasing the customer, vendors are incentivized to provide harmlessly anodyne advice (“you might want to look at this system sometime soon” instead of “pave and rebuild this system now”) and false-positive prone warnings (we don’t know if this matters, but CYA!), which analysts rapidly learn to ignore.

2: invest into a services community and partner ecosystem. No one can teach as well as a person who learned by experience first. Some very successful organizations build passionate communities of educators, developers, and deployment engineers. Organizations with armies of partners have huge reach compared with more narrowly scoped organizations. However, an army marches on its stomach and all these people have to be paid. The overall cost and complexity for a customer goes up in-line with ecosystem size, and after a while it’s impossible to get anything done with a week or two of professional services engagement. Regular tune ups, scheduled health checks, special projects to expand features, and keeping up with the vendor’s upgrades has a price tag that will require board approval.

3: invest into machine intelligence. If the data has few outside context problems, a machine intelligence approach can help the analyst answer qualitative questions about the data they’re seeing from the system. Normal versus abnormal: no problem! That’s very easy for a machine to determine, as long as you’ve trained it properly on what’s normal. Good versus bad: maybe not so much. Anomalous is not bad. It takes human effort and knowledge to determine whether an event or a metric reading is “bad” for the organization at the time when it happens. Risk of inaccurate advice is not eliminated by adding machine intelligence, so it’s best to think of this as a hybrid between the best-practice and services approaches. Consultants or employees with contextual awareness need to help with the implementation at any given customer, and the result is a best practice that needs regular re-tuning.

Perhaps we are seeing a reason why so many technology vendors don’t last very long as independent entities.

%d bloggers like this: