Book Review Content Corporate Life Customer Support fitness garmin Licensing Monitoring Observability Operations Partnership Product Management Products Sales Security User Experience

How Do I Drive Remediation SLAs?

Published by

on

The Star Trek Original Series Enterprise circles a planet, boldly. The closed caption reads "We've found nothing."

Question: I want to get my organization to patch things in a timely fashion, how? Can I just set an SLA (Service Level Agreement) of “patch the criticals in 30 days” and track that?

Speaking as a vendor who’s worked with patching systems for everything from big banks and government to small business and startup… it depends! You can start with How to Measure Anything in Cybersecurity Risk as a useful framework, but outside of a bank everything is more difficult to measure, more difficult to quantify in dollar terms, and more difficult to assess risk for. If it were all frictionless spheres in a vacuum it’d be easy, but as an info security person pushing SLAs you’re fighting against more important priorities and zero based budgets.

So, step one is to ascertain if leadership and middle management agree that this is a worthwhile endeavor. If one or the other thinks that your organization isn’t a target or that patching is too risky or that CVE severity scores are overblown… then you’re dead in the water trying to get all the critical vulns patched within 30 days. While everyone generally wants to do the right thing, they’ve also got full plates and the first way to get rid of a new problem is to avoid accepting it. How many vulnerabilities, and how do they line up with patches or service packs? What severities matter, and which CVSS version or vendor scoring system are you going to use? What about mitigations? Can’t you just accept the risk?

Poking holes in the accuracy of risk or value assessment is easy and profitable. One of my least favorite but most effective tricks is to question how many devices or systems are even in the organization… since an organization of any size will have some level of daily churn via sickness, PTO, equipment replacement, and service change, it’s easy for a skeptic to find a tiny discrepancy to latch onto and claim that the entire inventory can’t be trusted. You have to try to push back on the belief that your goal isn’t important, change some minds, get a commitment to try it your way. And if your push back fails? Might as well buy a guitar and spend your time more profitably learning the cowboy chords.

If you’ve got agreement in principle, then you’re in a good place to talk details.

  • Inventory and Evaluation: what systems, images, and build pipelines do you have? How long are containers or Serverless functions running between refreshes? How certain are you that everything running in production has been scanned? How frequently do you do that scan? Do the scans complete? Is the scanner’s vulnerabilities library updated frequently enough to know about the most recent issues?
  • Data Review: What’s most critical in the environment? What’s most severe in the discovered vulnerabilities? Where are there overlaps between those two sets? The list of discovered vulnerabilities is often surprisingly large, so being able to sort by the most critical systems and knock out the mitigated ones is helpful in determining subsets to work on first.
  • Process Design, Servers as Pets population: How will you notify the people dependent on these systems? How will you test the patches? When are the maintenance windows? How will you rollback or restore if something goes wrong? How do you reschedule for an emergency maintenance so you still hit an SLA if that happens? How will you assure that the vulnerability is resolved and the system still works?
  • Process Design, Servers as Cattle population: How will you notify the people dependent on these systems? How will you test the updated images? Are you really doing anytime CI/CD, or are there maintenance windows? How will you switch back if something goes wrong? How do you reschedule for an emergency maintenance so you still hit an SLA if that happens? How will you assure that the vulnerability is resolved and the system still works?
  • Process Design, Wild West endpoints population: You can’t force knowledge workers and field employees all around the world into maintenance windows and testing regimes, so you set policies and enforce them instead. How will you communicate those policies? How will you report on adherence? What enforcement or notification options do you have?

With those answers, you’ve got a plan you can document and work through the process, which will lead to a proposal and a regular metric review. Keep your reporting simple and clear and you should be good. Last step: when you’ve got your SLA being hit all the time, look to ratchet it tighter: shorter time frames, or more vulnerabilities, or both.


Discover more from Monkeynoodle.Org

Subscribe to get the latest posts to your email.