DevOps: there is no QA, there is no infra, testing and support are everyone’s job. This works okay for unit test level work, but end to end functionality involving multiple teams breaks all the time. You can ask DevOps to take that on too, but they’ll just laugh. You can ship without doing any, and see how that goes… nah, probably not. So, who’s going to take responsibility for integration testing? The field engineers, the product managers, partners if you’ve got them, and eventually the customers catch what’s left.
Field engineers are an important part of any enterprise vendor. The titles change by vertical, the responsibilities vary across several areas, but complex products always have people dedicated to explaining and supporting them. Sales Engineers (SEs), Consulting Engineers (CEs), Network Analysts (NAs), Technical Account Managers (TAMs), Customer Support Managers (CSMs), and Professional Service Managers (PSMs) are just a few of the titles I’ve seen and held. The responsibility ranges from doing demos to triaging cases, from carrying quota to being on-call. In some cases these engineers are designing and implementing glue code or content to solve customer use cases, but they are not developers. Even if you add product testing to their MBOs (Management By Objective, a way to judge individual success by the outcomes delivered), these engineers will remain unfamiliar with the tools and processes of development. I have known some field engineers who were better at testing than professional QA engineering, but they still correctly prioritized their day job.
So, let’s play it out: what problems arise when end to end integration testing is done by field engineers and product managers “in their spare time” or “on a 10% allocation“?
Problem one: labs. Maintaining a scenario lab is a huge time sink with lots of failure modes. Virtual machines were a huge advance in this problem area, and Infrastructure as a Service has been even better, but it still costs real dollars and real time to build and maintain a lab. Most field engineers and a lot of product managers buy and operate their own labs. Maybe that is with some financial support from the company, but it’s rarely totally paid for by the company. These labs, whether running in the cloud or running in the garage, end up eating hours per week in maintenance, troubleshooting, and upgrades. Identifying and reproducing a problem might be a half-day job, purely because of lab-related image creation, snapshotting, and restoration. Recovering from a power outage or migrating cloud hosts might take days.
Problem two: test coverage. Testing for bugs is a never-ending and thankless task that can’t be measured to anyone’s satisfaction. But something must be done, so what testing is actually critical to complete? It’s tempting to say “well, everyone just try it until we all feel comfortable”… but this is a product produced by a business, not a hobby project. Time and resources are not unlimited. The most effective and comprehensive answer I’ve seen went like this: write down the top three things this product is for. Put them into the demo, the documentation, the first pitch deck, and the functional test requirements. The development cycle isn’t done and the product can’t be shipped until someone has checked that these three things can be done with the new version.
A nifty side benefit of that approach is your field engineers can also use those tests to score customer maturity, by whether these things are done by the customer in production. For example, if you’re selling Retro Encabulators and your customer doesn’t even hook up the dingle arm, much less polish the logarithmic casing… the product is probably at risk of churn.)
This approach is not perfect though. Firstly, the critical use cases will grow in number, as rules will do. Second, because teams don’t grow as fast as rules and team members churn, a drive to simplify will cause each test to degenerate into a series of screenshots that become inaccurate with the next UX refresh. Which brings us to the third issue with this approach: it is not a one-and-done, it must be maintained. Lose the leader who owns and cares for quality, you lose the drive to maintain the use case driven system.
Back to performing testing outside of R&D, and problem three: spare time, there is none. Field engineers and product managers correctly prioritize customers over internal work, and therefore do not perform tests on time, if they perform them at all. Actual field or PM testing time is most likely done on evenings and weekends several days after the build to be tested is produced. This makes a weekly release cadence rather difficult to hit, much less Continuous Integration. If the definition of done is that someone tested it, but that someone can’t get around to testing it until three or four working days have passed, then you’ve got a lot of wait time baked into your cycle.
Problem four: communication. Field engineers typically do not closely follow development, and so it’s easy for them to accidentally test the wrong build and rediscover already known issues. That’s compounded by the fact that they’re probably doing that testing well after the developers have signed off for the day. A common story: developers introduce a feature on Tuesday, find something and fix it on Wednesday, and cut a new testing build on Thursday… a field engineer starts testing on Friday afternoon with the Tuesday build, hits the known issue, and stops any further testing until Monday. Teams with strong management skills will establish communication and processes to avoid that scenario. SaaS offerings feel this pain less, but can exchange it for long-running tests getting interrupted by continual rebuilds. Much of the time, teams that are able to get testing done more or less on time do it because of individual heroes that sacrifice their evenings and weekends. Other teams might fudge their definitions of done and rely more on hope and crossed fingers.
Problem five: scale. No field engineer or product manager lab will ever approach the volume, variability, or velocity of a customer’s production environment. It stinks to spend careful weeks in testing, finally release, and then suffer an immediate faceplant.
At this point, you might be thinking “Heck, we should just go back to the old ways of dedicated quality assurance engineering teams.” I’m not strongly for or against the concept, but I will note it’s not a fix-all. On the one hand, if you really want to have a thing be done, you hire a person to make it happen. On the other hand, if the thing is a never-ending and thankless task, that person will try to redefine it so that they can show success. This is the pair of long-standing problems that prompted Silicon Valley companies to switch to DevOps in the first place. In my experience, QA teams will gradually stop performing baseline functionality testing and start to focus on other things. Maybe it’s automation frameworks, maybe it’s performance baselining, maybe it’s exploratory testing… It’s always useful stuff, but who ensures that the product still retro-encabulates? Once again, it’s the field engineers and the product managers.
In a purely on-prem world, I’m afraid this dismal picture doesn’t get better. The best driver of success that I’ve seen is to go to the cloud (but please don’t do it badly). A ring-based, feature-flagged cloudy product delivery model that exposes the product to the real world much sooner makes it much easier to perform quality testing. For instance:
- Ring 0 might be the engineering ring, built from main and offering no guarantees
- Ring 1 could be the company-wide sandbox, where the first integration tests are done
- Ring 2 might be your internal production instances where you first dogfood with real data and real users.
- Further rings are then left to your imagination. Got customers who want the new shiny sooner, or customers who want to defer your updates until their change controlled maintenance windows open? Great, rings and flags let you help those customers get what they want.
From a testing perspective, the great thing about cloud-based rings is that your lab maintenance effort is reduced. Where all the home labs used to be fully isolated and local, now everyone’s testing environments are easily shared, potentially even in shared instances. This means that functional testing can start to happen with greater scale, and certainly a lot more visibility. Anyone in the company can try out any part of the product and be relatively sure there is relevant data to demonstrate the functionality with. This infrastructure improvement doesn’t change the calculus of Who was doing What testing: it will still be true that 10% of the company does 90% of the QA and the total QA done will be less than half of what “should” be done. What shared rings change is the cost-effectiveness of that testing. The work of maintaining your own lab is reduced significantly, possibly to nothing.
The second biggest improvement of the rings and flags model is the reduced impact scale. Being able to push into test and dogfood without hitting paying customers means your first production impacting move will impact relatively friendly coworkers. Feature flags mean you have a big red button to turn the new change off on a per-customer basis. Both of these technologies can still be used incorrectly of course, and development teams continue to suffer from overload and make mistakes. The benefit of rings and flags is that certain failure modes aren’t irrevocably baked into the process of developing software any more.