Planning API load testing can be a rabbit hole into which you fall… forever. The bigger the API is, the bigger the hole. Thinking about testing use cases can go on endlessly. Just when you think you’ve covered every test possibility, you find yourself waking up in the middle of the night with a new testing scenario to implement.
My colleague, Raymond Leung, a respected Quality Assurance Manager in Santa Monica, CA does not have this problem. He sleeps well. Why? Because Raymond learned an approach to API testing that reduces guesswork and increases testing effectiveness. Raymond learned to start with a Service Level Agreement (SLA). To quote Raymond,
“The tests are goal oriented. A test result needs to be better than SLA.”
Raymond learned the SLA First technique during his time when he was Q/A Architect at NFL.com. Here’s how it works: The product group defines a product according to a business scenario. Then, the business scenario gets turned over to the Engineering Architecture Group where the SLA is created. Business use case descriptions get turned into measurable performance metrics.
For example, the business scenario might have a description that says, “10 million people will come to site in an hour, with 2 million people signing up in the peak hour.” The Engineering Architecture Group will translate the description into an SLA that describes behavior in terms of API calls per second.
Raymond points out that a good deal of time is spent coming to a common understanding between Product Speak and Engineering Speak. Both parties have a particular way of talking about business requirements, but the SLA needs to be defined from an engineer’s point of view so that all members of the tech team can work without ambiguity.
Once the Engineering System Group creates the SLA, the first thing to do is to make sure that the physical infrastructure can support the SLA requirements. For example, can the network support the amount of traffic anticipated? Does the data architecture provide adequate DB access to support the SLA? Does the virtual machine(s) configuration specify the amount of RAM and CPU power needed to support application performance dictated by the SLA?
After working through the physical analysis, the group inspects the code to try to determine impact points such as thread allocation and I/O blocking. They want to make sure that the code that is not blatantly out of whack with the SLA in play.
Then, testing begins. Q/A creates the automated tests scripts required. The Q/A team analyzes the SLA to see if they need to use 3rd party services to run the scripts or if they can execute the scripts in house. From Raymond’s point of view, if the SLA calls for supporting more than 10,000 API calls per second, he’ll use a 3rd party service. If the SLA is supposed to achieve performance metrics on a worldwide basis, he’ll use a 3rd party. According to Raymond, it doesn’t make financial sense to implement in-house support for test origination from points in Europe, Asia, and US. 3rd parties can do what needs to be done at less cost.
After the tests are run, results are gathered and the Q/A group analyzes the results to make sure the requirements defined in the SLA are met. If not, all groups gather in a War Room, roll up their sleeves, and try to figure out what’s going on. Some analysis is automated. Some is manual. The War Room session might be a day. It could be weeks. How long it can take to solve the mystery of large scale test failure is always an unknown. The important factor is that everything is analyzed in terms of the SLA.
Also of equal importance is to make sure that all parties test according to the same metrics, using the same testing techniques. To quote Raymond,
“Unlike functional test[ing], performance test[ing] is very abstract; everything is based on numeric measurement.”
Raymond points out that one of the thing he noticed in his years of enterprise level testing is that, without common planning, it’s typical to get different test results from Dev and Q/A. Raymond attributes the disparity to the different measurement techniques used by each group. For example, one group might take a data driven approach using predefined values to exercising an API’s hit tolerance. Another group might take a random value approach. One group might execute the tests using Tool A, whereas the other users Tool B.
In a large organization that might employ hundreds of developers and Q/A personnel, lack of communication is common. People are not willfully uncooperative. It’s just that the size of the organization makes collaboration hard. Management must foster a culture of cooperation. It’s not going to happen by magic. Cooperation must be planned and it must be valued.
Raymond advocates that Dev and Q/A work closely together to agree upon the same tools and techniques used in the process. There must be a firm understanding between groups about how pre-test environments will be provisioned and configured. Both groups should use the same tests, with the same testing tools. And, both groups must test using the same protocols, HTTP vs. HTTPS, for example. (Of course, a well developed SLA will define exactly which protocols are to be use and when.) If the tools and techniques are not standardized, reliable testing is unlikely.
Taking an SLA First approach to API testing will make life easier for all. You’ll have firm criteria to use for testing planning and you’ll have a common definition of system performance that all groups can use as a common point for reference when collaborating. However, using SLA First is not going to happen overnight. Establishing a collaborative environment in which all groups use the same tools and techniques to execute performance testing is going to take time. As Raymond Leung says:
“The lesson I learned is to be patient with Developers and Ops.”
I couldn’t have said it better myself. All good things take time.