As enterprises move more and more workloads to the cloud, the first pain our customers feel is the sting of cost overruns. But why is that? The budgets were planned. Some initial sizing was done. But almost immediately, costs are the first to start causing headaches for IT managers. It is apparent that all companies soon will require a structure to integrate cost effective measure. Some of the usual pitfalls and possible lay out of a comprehensive framework to manage costs of cloud workloads are addressed here.
When starting their cloud adoption journey, enterprises often miss putting a cost management framework in place. This usually results in a situation, commonly called the “cloud sprawl”. These lead to (often substantial) cost overruns. Some of the common reasons that occur include:
To utilize the full benefits of the speed and agility that cloud provides, modern IT usually provides a common services framework, wherein the business teams are allowed to manage the cloud resources for their applications themselves. While this is the recommended practice, cost ownership often falls through the cracks. We’ve seen customer situations where IT creates accounts and projects for business teams to use, and then hands them over to the business teams, but still owns the costing and billing.
What results from this arrangement is that the business teams get a free reign to create resources, which they do, often well outside of their allocated budgets. They are also neither aware nor often bothered with the mounting spends since they are not the ones footing the bill.
This is usually made worse with the fact that IT does not have strong cost reporting mechanisms to bring visibility into the who and what of the budget overruns.
Budgets and TCO
Doing an initial cloud TCO (Total cost of ownership) is absolutely essential to arrive at a budget for your cloud landscape. When this is not done, stakeholders have no visibility into what their infrastructure is going to cost. Cost saving is often one of the big reasons for cloud adoption, but not doing this exercise results in a bill shock to the enterprise and often slows down the adoption momentum.
Even when enterprises do a TCO exercise, they often do the TCO for the final production landscape. They sometimes miss taking into account the migration plan, DevOps processes and Go-Live dates, and also do not sufficiently size for them. This causes situations where costs skyrocket even before the application is fully migrated. Dev/Test environments tend to severely bloat up and eat into the overall budget.
Even when enterprises have done initial sizing and defined cost ownership, having day to day visibility into the cost is important. Because it’s very easy to create resources in the cloud (within minutes), waste becomes a concern. Resources may be created for temporary use but never shut down.
There have been situations where hackers have obtained access to customers’ cloud accounts and created hundreds of servers. The problems with lack of visibility can be due to stakeholders not having granular visibility and actionable insights into their cloud landscape, continuous monitoring not in place, resulting in Month-end Bill shocks and no availability of projections on cloud utilization trends.
Building the correct cost governance is a key pillar of the overall cloud governance framework. Problems occur when some of the following governance structures are not put into place:
- Tagging strategy is required for both automation and chargeback/showback. When a comprehensive tagging strategy is missing, it gets very difficult to do a deep dive into billing data to identify which applications resources belong to and who created them
- Enterprise level access control and provisioning policies, when not clearly defined and enforced for cloud, result in unauthorized actors to create resources. Controlling who can create what resources is essential to manage the cloud sprawl
- Cloud governance requires behavioral change across organizations. Enterprises that try to retrofit existing processes that work on-premises to the cloud, will lose the advantage that come with it. On the other hand, moving to cloud without training the various stakeholders on the new governance models also result in lapses and corresponding loss in visibility and tracking
Even when governance models are defined, for large landscapes, enforcing governance manually comes close to not enforcing it at all (imagine tagging over a thousand VMs manually). When tools and automation strategies are not used and applied across the entire cloud landscape, IT teams always play catch up and endure a lot of manual work to keep the landscape in shape.
Similarly, when cost management and remediation tools are not used, manual compliance, cost reporting and optimization become simply untenable, and are often abandoned.
Public clouds are evolving fast. They already provide innovative features like auto-scaling, that are not available in on-premises environments. In addition, they provide innovative costing models, and multiple discount options.
Lastly, they come up with new managed services that not only allow the customer to pay only for what they use, but also lift the management overhead for these services. Enterprises miss out on these benefits when:
- Apps don’t utilize cloud features to optimize cost (e.g. autoscaling)
- Enterprises don’t use Cloud platform discounts such as AWS Reserved Instances
- Enterprises don’t do periodic reviews for validating evolving application architectures
Cost Management Approach and Recommendations
Based on our experiences with customer landscapes and cloud best practices, the following key recommendations can help enterprises control and optimize costs effectively.
At a strategic level, this requires that enterprises start by defining and implementing a clear cloud governance model. Once a model is defined, they can implement solutions that provide deeper visibility and actionable insights for cost management.
It is highly recommended to enforce governance via automation, as this provides predictability and reduces overhead. It is important to define and implement access control and ownership of cloud resources, so that responsibilities are clear and unambiguous.
Together, over time, these fundamentals increase awareness and enable behavioral change and discipline needed for successful cloud management.
At a tactical level, enterprises can drive cost visibility by implementing and enforcing mechanisms such as mandatory resource tagging, a lightweight cloud inventory management system, and reporting and recommendations on current cost and projections, utilization and non-conformance.
Finally, for active cost control, enterprises can use the metrics to identify and clean up unused and non-conformant resources automatically, implement resource scheduling and use cloud provided discounts and reservations and using discounted excess cloud capacity in the form of spot instances for non-critical workloads, especially Dev/Test.