A few years back, West Drayton control center had a downtime caused during a planned upgrade of their air traffic system. Four major airports were impacted including Gatwick, Heathrow, Manchester, and Inverness. Flights were grounded and by the time the system was fixed many flights were delayed by over half a day.
Recently this year, the Royal Bank of Scotland (RBS) indicated they may pursue a legal action against the software vendor for a disruption caused during a software upgrade which impacted millions of their customers. The glitch would cost RBS around $156 mn in staff overtime costs and compensation claims.
Bad deployments are never intended to happen and are nightmares that Independent Software Vendors (ISVs) have to deal with. The reasons for their occurrence are varied. Common technology related reasons like inaccurate or changing requirements or specifications that in turn cause reduced testing or schedule pressure, differences between test and customer environments, application scalability or sizing gaps, insufficient people or technical skills, ineffective project management or oversight, and business related reasons like lack of sponsor insight and support, business value justification or economic climate.
While ineffective project management may seem like one major reason for the projects to fail, however, with the evolved project management methodologies and controls put to use these days, it is not a key contributor.
Also, when problems uncover at the customer site, pressure factor multiplies like pressure of saving grace to show you are dependable, resolving blockersto show you have things under control, working on tight timelines to show you can deliver, and so forth. This goes to show that problems are best fixed before the product reaches the customer site.
Here is a step-by-step approach that ISVs could follow proactively to avoid failures at customer sites.
Developing a Solution-view versus Application-only View
The key point is that the customer is buying a solution for a business problem not a collection of hardware and software. A simple solution may comprise of individual components like servers, storage, middle-ware, database, applications and processes, and a complex solution may have multiples of these. For the solution to work, these individual components are required to be optimally glued with each other so that they can interoperate and integrate seamlessly.
Much of the technical complexity lies in this integration. The dependencies between multiple software and hardware vendors involved makes the challenge tougher. ISVs should identify these interconnects and dependencies, hardware upwards, and proactively test the software and hardware stack in their R&D environments. This is a scientific approach to eliminate surprises at a later time. It will not only help ISVs and infrastructure vendors come together to gather a better view of the solution scale and size but also pre-empt majority of the integration problems in R&D, which is a better place to fix issues.
Bottom-up Approach to Resolve Misconfigurations
It always helps for the car driver to know what sits under the hoodto manage eventualities. Similarly, software vendors ought to get under the hood to understand how their software uses the underlying hardware. Operating System (OS) vendors provide multiple tools to monitor resource utilization and interfaces to tune the OS for particular workloads.
This 2-step approach to monitor and tuning can help avoid misconfigurations at hardware resource level while also providing an opportunity to software vendors to publish parameters that work best for their software. Some of the most common performance pitfalls to look for are memory and IO configurations.
Configuring and Testing Applications
Every ISV tests the software they produce. However, successful testing only conveys that the product does not have any bugs. It does not necessarily convey the performance quality and characterization of the product. This is where benchmarks play a critical role and help to plot the torque curve (load versus performance) of the solution. In a well thought-out benchmarking exercise, relevant and fair volume of workloads are applied that best represent majority of customer requirements.
Not Undersize or Oversize but Right-size the Solution
A banking customer typically lays out its current and projected transaction and batch workload volumes as requirements, along with some anticipated non-functional requirements. Through a few quick calculations, the requirement emerges as a target Records per Second (RPS) or Transaction per Second (TPS) figure. Some ISVs may apply linearity to smaller run results that they may have achieved and arrived at the supporting infrastructure required for the target workload.
However in reality, the application may exhibit a different linearity gradient at different workloads, some of the levels not yet tested. In fact, at certain scale, it may not even scale linearly. How does the ISV then derive the right sizing recommendation or know when to propose a scale-up or a scale-out design?
Participation in Beta Programs Can Lower Risk
Participation in vendor beta programs can yield better alignment between the hardware and software vendors. It also provides a window to taste what is coming and how the new feature sets can be leveraged by your product. When hardware vendors announce new versions of OS or platforms, participation in beta programs provides an opportunity to software vendors to port, test, and certify their product before General Availability (GA), hence reducing time to market.