
Ways To Use Cloud Elasticity and Gain Cloud Savings
21 Feb 2019Multi-AZ RDS Instance Stops
05 Mar 2019
But what’s next?
The Challenge
It may be safe to say that the industry agrees that automation for creating cloud resources is best practice. We in the cloud resource and event management tools ecosystem (that’s where GorillaStack is) think automated resource creation pipelines and CICD are vastly important.
We would like to draw your attention to how these initiatives are the first essential step in managing run state resources and events for what we like to call ARE (availability, retention, and elasticity).
Given that resources will spend most of their time in the run state even if you have an aggressive build and teardown pipeline, managing their life cycle efficiently will be the biggest contributing factor to cloud cost or savings; i.e. resource sprawl and bill shock.
The good news is that there are simple things to do when planning your projects and pipeline to resolve this concerns before they even become issues.
A Cautionary Tale
Let’s examine this a bit more closely by starting with a cautionary tale…
A cloud user at a major energy company that spends more than $8M USD a year on cloud remarked to us the other day that the CICD Pipeline project they completed was the prime contributor to a 3x growth in cloud costs in less than 3 months.
That got our attention, we usually associate automation with savings so I asked if the growth in spend was due to migrating more of the organization onto the cloud.
He said that in fact, no new teams or applications had been moved but that they had inadvertently made it too easy for individuals and teams to spin up new infrastructure and had forgotten to build in the necessary guardrails for cost governance and management.
This triggered memories of watching huge cloud users like Expedia and Netflix presenting about their cloud journeys and how steady growth in spend seemed to magically explode 3x-5x in a very short amount of time when cloud “took off”.
Solution
Here at GorillaStack we live and breathe best practices in cloud resource management and our customers benefit from 36% net savings on their cloud accounts on average. So it’s fair to say that we have some expertise in this area and specifically in automation as a service for operations and operations as code.
In conversations with organizations on this topic we often hear that their first priority is to focus on CICD pipeline automation so they can build automation and controls for resource automation.
The problem with this approach is that these organizations are walking right into the situation highlighted by the cloud user at a major energy company outlined above.
It will be really easy to create resources but not so easy to manage them and govern or control costs. You will likely turn to your cloud analytics solution to find out what the contributing factors are to your cloud sprawl and bill shock.
These solutions will point out underutilized and unutilized resources but unless you have a resource management remediation and automation plan as well, you’re going to be having some tough discussions with Finance.
Note: If you’ve tagged resources holistically and consistently then it will become pretty evident where the waste lies. f you haven’t, have no fear it’s pretty straight forward to refine your tagging policies for ARE and bring some visibility into the equation (topic of a companion blog to this one).
The good news is that with a modest change in mindset and a bit of planning during your CICD project, you can set yourself up for maximum resource efficiency and cost savings simultaneously.
Don’t worry if you didn’t think of this and have already deployed your CICD pipeline, the good news is that it’s pretty easy to adopt these recommendations after the fact.
Availability, Retention & Elasticity
The first step is to start thinking about cloud resources in terms of availability, retention and elasticity (maybe we’ll make an acronym out of this ARE:-). ARE you ready? If you’re a larger organization you’ve probably got Project Management support that can assist with ensuring that ARE is part of the planning for any new workload.
Start talking to the resource owners about when they are actually running their workloads and need their machines online. Keep track of opportunities you uncover to tune-up the availability of these resources with automation.
Tagging Automation
The second step is to devise a tagging scheme for resources based on ARE. That way as new resources are spun up automatically with your CICD automation, they’ll be tagged and easily identifiable for resource management automation. S
Here are some real examples from users of GorillaStack’s SaaS platform:
[Availability : 24x7]
if they can’t be turned off (these are good candidates for reserved instances)
[Availability : BusinessHours]
tag resources that can be shut down out of business hours
[Retention : StandardSnapshotRetention]
added as a tag when snapshot backups are taken
[ElasticityUp : StartOfBusiness]
another great use case is adjusting elastic resources like ASGs or Dynamo DB to scale to 0 when they aren’t needed (leaving no pesky minimum instances running) and revert back to the original min-max- desired settings when needed.
If you take these two steps, then you’re well positioned to manage resource utilization and ultimately buy precisely the right amount of reserved instances.
Cloud Cost Optimization Through Automation
When it comes to automated management and remediation like hunting down detached volumes based on retention policy, you’ll have three choices:
- roll your own code,
- use native cloud tooling or
- leverage a SaaS tool like GorillaStack.
We hope you’ll give thorough consideration to pairing a SaaS resource management platform like GorillaStack with your CICD automation. We know we’ll save you time and put more money back in your budget than the other two options.
You could write your own code to do the basics, but you’ll find the complexity can quickly get out of hand as you grow and that your users get frustrated if things go wrong. We’ve heard tales of zombie scripts destroying all of a company’s backups or turning infra off and impacting users.
You could try out the native cloud capabilities for scheduling and lifecycle or even open source solutions but unfortunately these tend to be rudimentary and the cost to manage and maintain is typically underestimated. Additionally, they don’t take full advantage of tags and act across accounts such that setup time and maintenance can be a pain.
We’ve had our users try all of these approaches, sometimes discontinuing their GorillaStack account and then inevitably returning and realizing that “to code is good but to tool is better”!