Say there’s a large data set, something like the lake under a SIEM. How do you find interesting elements from it? There’s a few different tricks, such as throwing it through a series of algorithms. One fun trick is to use a generic dashboard pattern. One I’ve played around with a few times is the top 40 most interesting.
For slicing through large data sets, the T40MI is an interesting tool that can be easily used across a lot of single-variable domains:
- Top 10 most common items, with raw counts
- 10 biggest increases in $time_period, as percentages
- 10 biggest decreases in $time_period, as percentages
- Bottom 10 least common items, with raw counts
Pro: it covers all sorts of interesting things from all sorts of data sets! Doesn’t matter if the data is normal or not. Shows interesting transients and gives a quick glance of importance to what matters. It also reveals context quickly. I like dashboards that show you time-relevant changes across the recent past along with current status.
Con: it takes some deep understanding of the data set. You’ve got to transform every new data set to pick out the attribute that matters. Pick the wrong one and this is a meaningless dashboard.
I’ve used it effectively with browser user agent strings, top level domain names, and certificate authorities. It’s really effective for spotting changes in user behavior. I’ve seen it flop with network ranges and more detailed DNS names, where the level of randomness is just too high.
Here’s an example of a useful outcome. During a software update, you can watch to see if the installed versions of an application are updating as intended.
- Starting: the old version is in the top 10 list, and your new version appears in the bottom 10 list.
- In progress: the new version is in the 10 biggest increases list, while the old version is present on the 10 biggest decreases list.
- Complete: when the new version enters the top 10 list and old version enters the bottom 10 list, you can declare victory. You also expect to see new and old versions drop out of the biggest increases and decreases lists.