ai architecture artificial-intelligence blog blogging Book Review business career Compliance Content Corporate Life Customer Support cybersecurity data data-science DevOps education entropy fitness garmin leadership Licensing life marketing microservices Monitoring music Observability Operations Partnership philosophy Product Management Products saas Sales Security software-development technology User Experience wordpress writing

Top 40 Most Interesting

Published by

on

Photo of an apartment building on a steep hill, with the camera tilted so that the building seems tilted and the street seems flat

Say there’s a large data set, something like the lake under a SIEM. How do you find interesting elements from it? There’s a few different tricks, such as throwing it through a series of algorithms. One fun trick is to use a generic dashboard pattern. One I’ve played around with a few times is the top 40 most interesting.

For slicing through large data sets, the T40MI is an interesting tool that can be easily used across a lot of single-variable domains:

  1. Top 10 most common items, with raw counts
  2. 10 biggest increases in $time_period, as percentages
  3. 10 biggest decreases in $time_period, as percentages
  4. Bottom 10 least common items, with raw counts

Pro: it covers all sorts of interesting things from all sorts of data sets! Doesn’t matter if the data is normal or not. Shows interesting transients and gives a quick glance of importance to what matters. It also reveals context quickly. I like dashboards that show you time-relevant changes across the recent past along with current status.

Con: it takes some deep understanding of the data set. You’ve got to transform every new data set to pick out the attribute that matters. Pick the wrong one and this is a meaningless dashboard.

I’ve used it effectively with browser user agent strings, top level domain names, and certificate authorities. It’s really effective for spotting changes in  user behavior. I’ve seen it flop with network ranges and more detailed DNS names, where the level of randomness is just too high.

Here’s an example of a useful outcome. During a software update, you can watch to see if the installed versions of an application are updating as intended. 

  • Starting: the old version is in the top 10 list, and your new version appears in the bottom 10 list.
  • In progress: the new version is in the 10 biggest increases list, while the old version is present on the 10 biggest decreases list. 
  • Complete: when the new version enters the top 10 list and old version enters the bottom 10 list, you can declare victory. You also expect to see new and old versions drop out of the biggest increases and decreases lists.

Discover more from Monkeynoodle.Org

Subscribe to get the latest posts sent to your email.