A few weeks ago Peter Shi, the main man when it comes to Cost Optimization for AWS in APAC, ran an awesome workshop where participants from some of the largest organizations in Australia got to take part in discussing and recommending their most treasured DevOps tools when it comes to cost optimization.
For our synopsis, we split the tools and into 5 main buckets – Discovery & Migration, Cost Visibility & Resource Recommendations, Tagging & Allocation, Logs & Events, Automation and Spot Instance Management. If you have experience with any of these tools and want to give feedback or if you think we’ve missed anything we’d love to hear from you on our Slack Channel or you can reach out to Peter!
Organizations that are planning a migration use AWS Application and Discovery service to identify and map applications that are running on-prem.
Armed with that information, organizations can then plan for what they’re new environment will look like once it moves into the cloud using other migration tools.
Migrate databases to AWS using AWS Database Migration Service. One major upside is that the database remains fully functional during migration so there’s minimal downtime.
Database Migration Service allows you to migrate data from most major databases including Oracle, Microsoft SQL, IBM and many more.
Use AWS Import / Export Calculator to figure out the cost for transferring data into and out of S3 buckets. A simple tool and highly recommended in advance of a large data migration to plan out your costs.
AWS Migration hub allows you to track migrations across multiple AWS and partner solutions. Use Migration Hub to choose the most appropriate tools and plan in advance of any migration work.
Track progress and performance of your migrations in one single spot.
AWS Server Migration Service (SMS) is an agentless service which makes it easier and faster for you to migrate thousands of on-premises workloads to AWS. AWS SMS allows you to automate, schedule, and track incremental replications of live server volumes, making it easier for you to coordinate large-scale server migrations.
Arguably more complicated than the Import / Export calculator, the Simple Monthly calculator allows you to estimate charges over a range of AWS services. Be prepared with all of your projected service usage before using it.
Once you have a good idea of your use cases you’ll find the SMC incredibly powerful for estimating your charges going forward.
Snowball is a hardware device that organizations can you use to load massive amounts of data into and send off to Amazon. It’s available with 80TB of storage in all regions and also with 50TB in the US regions.
Encrypted data is loaded into the Snowball before the hardware is sent off to Amazon for transfer into the cloud. The predominant advantage of Snowball is that there you’re not transferring data of the internet so there are significant time and money savings.
Less granular than the AWS Simple Monthly Calculator, use the TCO calculator for a high level look and comparison for AWS Total Cost of Ownership vs On-Prem and other hosting options.
Vendor: Samba [Open Source]
Organizations use this open source tool along with 3rd party file system tools to copy data directly into S3 buckets.
AppDynamics APM automatically discovers, maps, and visualizes your critical customer journeys through each application service and infrastructure dependency. Teams have a single source of truth to focus on end-to-end performance in the context of the customer experience, instead of monitoring individual services.
Deloitte ATASphere provides service delivery partners and enterprise customers a way to map, manage, migrate & protect an any-to-any combination of applications across data centers, hypervisors, private clouds and public clouds.
Use Cloudamize to project TCO, breakdown your costs, map your workloads and forecast your expenses, all in advance of any migration.
A simple but powerful tool for visually designing and mapping your AWS environment and infrastructure.
Manages the process of migration to AWS with automation including block-level replication, automated machine conversion, and application stack orchestration.
Discovers and maps infrastructure for classification and analysis in anticipation of migration into the cloud.
Before a cloud migration, use Device42 to identify and track the logical components, relationships and interdependencies in your environment.
Using a read-only policy, you can plug Hava into your environment to get a diagram of your network infrastructure back immediately.
Vendor: New Relic
Combine New Relic APM and Infrastructure for a clear idea of your infrastructure, enabling ops and dev teams to get on the same page.
Post migration you can track changes to your environment and the effects of those changes.
Vendor: TSO Logic
TSO logic plugs into your environment to create an extremely granular view of what it might cost to migrate the environment to the cloud. It uses AI and statistical models to examine the compute, rightsize and OS requirements before migration.
Once the analysis is complete you can use the output for a roadmap and how-to for setting up best practice in setting up a new rightsized and appropriately provisioned environment in the cloud.
Use AWS Budgets to set custom alerts for costs and usage according to budgets, you can slice and dice those alerts by AWS service, tag & linked account.
Receive alerts via email or SNS if you want to trigger environmental changes. You can even track RI utilization to get a high level overview on how your RI purchases are going and whether or not you need to adjust your buying strategy.
The AWS Cost and Usage Report (CUR) lists the cost of every line item in AWS by account and user. The line items can be broken down hourly, daily or monthly. Further, the report can be filtered by tags
The CUR can be consumed as a CSV file that is written to an S3 bucket, but also programmatically via API.
AWS Cost Explorer provides straightforward graphical data about your AWS spend across all services and accounts. Of particular use is the breakdown of spend over time so you can see how your track across a month and identify cycles. Spend can also be filtered and grouped by dimensions such as product and tag.
There are a couple of reports that are particularly useful, such as further breakdown of EC2 spend, Reserved Instance (RI) recommendations, RI Utilization, and RI savings.
Not so much a tool as a CloudFormation template, you can deploy the EC2 Rightsizing solution to analyze the last 2 weeks of utilization data and identify candidate instances for rightsizing.
With such a range of instance size and types available, this is a good first pass at figuring out any obvious choices before delving into some of the more granular focused paid tools.
To get access to the best parts of Trusted Advisor, you’ll need to pay for premium support. Trusted advisor provides key recommendations around rightsizing and underutilized resources as well as making recommendations around RI purchases.
Like many of the AWS products, a good first step before getting into some of the deeper toolsets to solve the cost optimization challenge.
More of a projection forward than a report of historical usage, the Simple Monthly calculator allows you to estimate charges if you’re looking to make changes over a range of AWS services.
If you figure out your projected use cases you’ll find the SMC incredibly powerful for estimating your charges going forward.
AWS Systems manager gives DevOps engineers a unified view of resource groups.
You can view detailed system configurations, operating system patch levels, software installations, application configurations, and other details about your environment through the Systems Manager dashboard
Vendor: Atlassian [Open Source]
Squeegee is the brainchild of Atlassian and is Open Source. The code enriches and stores CUR data in parquet files in S3, that can be queried using AWS Athena and visualized using BI tools.
Vendor: Beeva [Open Source]
Manage your Trusted Advisor Alarms, AWS Health notifications and AWS Support cases all in one place with Open Source Code.
Vendor: News Corp [Open Source]
Simple but effective – Consigliere is Open Source Code from News Corp that allows you to aggregate your Trusted Advisor data into one account.
Vendor: Teevity [Open Source]
Ice is Open Source Code that was originally conceived by Netflix but is now maintained by Teevity.
Ice gives organizations a high level view of their spend (including Reserved Instances) and allows them to drill down at a very granular level for point in time analysis as well as trend usage patterns.
Apptio Cloud Cost Management is a tool that allows users to understand, map and allocate their cloud costs and usage.
It can also also identify underutilized and idle instances which users can then implement to make cost savings.
Cloudability gives the financial departments of organizations full transparency around cost allocation to departments and resources. The software makes clear recommendations around optimization to ensure infrastructure is running at the best possible price.
They have a Reserved Instance planner and recommendations around rightsizing for underutilized instances.
CloudCheckr gives you visibility around the security and cost usage of your cloud. Targeted for DevOps, SecOps and FinOps team, CloudCheckr gives you configurable view to optimize spend and eliminate waste in your environments.
Additionally, CloudCheckr empowers organizations to undertake cost allocation across internal groups and has a small degree of self-healing automation.
Vendor: Cloud Conformity
Cloud Conformity focuses primarily on security but provides broad and useful insights for cost visibility. They have an interesting and compelling model in that they take AWS best practices and map your environment against those key recommendations.
Their product simplifies bill visibility and analyzes usage trends to provide insights, recommendations and spend projections that can be filtered by regions, accounts, tags and projects.
CloudHealth is a heavy duty tool for cloud visibility, cloud cost management, resource utilization visibility and governance.
Like many of the other tools CloudHealth allows users to break down the cloud spend by a number of variables so organizations can manage their spend by tag, region, account or project.
CloudSqueeze is a lightweight tool that analyzes an environment’s utilization and returns graphs for feedback to the user. Additionally CloudSqueeze can make some high level recommendations around utilization and rightsizing.
Cloudwiry is another lightweight tool that allows users to monitor their AWS accounts spend in customizable dashboard and graphs.
Cloudwiry also makes recommendations for changes that users can make to their environments for cost optimization.
Densify uses artificial intelligence & machine learning to analyze cloud usage patterns and recommend performance optimizations.
Their Cloud Optimization Engine establishes predictive demand patterns to create models and optimize supply to the demands of your cloud environment.
Primarily a monitoring tool, Metricly analyzes billing and performance data to give users cost visibility with filtering, sorting and recommendations.
The tool can also detect usage pattern changes to alert you if anything unusual is anticipated in the bill before it becomes a problem.
ActOnCloud by ActOnMagic has a mixture of features. Along with a spend analyzer, it has a “Trusted Fixer”, empowering users to fix resource leakages and to rightsize.
The software provides billing alerts and auto scaling, as well as deeper monitoring with server alerts.
Botmetric, acquired by Nutanix, is now known as “Beam”. Beam gives organizations analysis and visibility of their cloud consumption as well as the ability to execute recommended changes with one click.
Organizations can set budgets then monitor and control usage according to those budgets – as well as being able to break down spend by team, tag, account & region.
Vendor: Nuvola Analytics
Nuvola is a platform that gives you visibility over your cloud consumption. Interestingly they set KPIs for how they think your cloud should be performing and measure you against them.
The platform provides rightsizing recommendations as well as the ability to predict how your costs will track and evolve over time. You can set budgets and get alerts when things are shaping to get out of hand.
RightScale Optima unifies visibility across all your clouds and then allows you to filter by account, team, app or tags. Organizations can use this information for chargeback and showback.
Businesses use the product to identify underutilized resources and optimize further by predicting costs and provisioning accordingly.
Scalr provides cost visibility and actionable steps that you can take to create meaningful cost optimization. By breaking costs down to an application layer, users can truly optimize with all the facts at hand.
With Scalr you can set policies around budgets and resource types to create a safe environment where developers can operate without fear of overspending or stepping outside of the guardrails.
PyraCloud empowers organizations to define business units and establish budgets to track and optimize spend throughout the organization across all cloud resources.
While chiefly a monitoring tool, Splunk can also give users visibility over your resources and how they relate to each other.
Splunk also has dashboards for tracking costs of your AWS usage over time whilst giving users insight into unused resources and allowing them to track against budget.
Stax empowers organizations to allocate cost by business unit, tag, application and instance. Organizations can track their usage across the month and how it compares to budget.
What differentiates Stax from a number of others in this category is their wastage reports which shows across a number of services, exactly where there is wastage and how to solve identified issues.
Use Wavefront’s AWS monitoring analytics to create dashboards that bring into one place CloudWatch, CloudTrail and native AWS cost reports. Once the dashboards are in place, you can identify over provisioned resources and optimize cost.
Not only useful as a migration planning tool, Cloudamize allows you to identify which resources ar ripe for rightsizing as well as how to get optimal performance at optimal cost.
One cool thing you can use the product for is to plan and forecast with multiple scenarios whereby you can model different regions, pricing plans and instance types.
Receive actionable notifications from FittedCloud which can be executed to optimize cost and improve infrastructure efficiency.
Yotascale uses machine learning and AI to continuously monitor your environment and make actionable downsizing recommendations.
AWS Tag Editor should be your go to for tagging and editing the tags of resources on an ad-hoc basis. Its available in the Console but doesn’t provide automation like some of its paid and OSS competitors.
Vendor: Answers 4 AWS [Open Source]
Graffiti monkey is a handy piece of open source code that looks at EC2 instances and copies its tags to any attached EBS Volumes and subsequently any snapshots of those EBS volumes.
Vendor: Capital One [Open Source]
Cloud Custodian by Capital One is another piece of Open Source Code. Out of the box you can use Cloud Custodian to stop any resource in 4 days if it does not meet tagging compliance policies but obviously customize it to your needs.
Vendor: GorillaStack [Open Source]
Use GorillaStack AutoTag to automatically tag any resource tag with the user that created it. This piece of open source code means your users will always be responsible for the resources they provision.
Among its other strengths, Cloudability allows organizations to heavily customize how they want to allocate costs. The tool is very much designed for FinOps teams to visualize by resource, application, tag and application.
As with Cloudability, CloudCheckr users can allocate cost center and break down spend within an environment based on tags. With most of these allocation tools its recommended that you have a good tagging policy in place initially or use one of the open source tools to apply tagging in a consistent fashion.
CloudHealth provides 2 distinct ways to unravel and visualize cost, allocating either by resource and or allocating by tag. The resource allocation methodology is incredibly granular but involves custom work, in particular with the Detailed Billing Record.
CloudHealth also allows user to view cost center by tag and is the recommended place to start before investing in a deeper dive.
PyraCloud Custom Group Manager allows customers to organize cloud workloads and environments into business units and cost centers. Organizations can define the hierarchy based on internal reporting needs.
Once defined, the resources can be grouped to business units and cost centers. Organizations can effectively govern, budget, track spend and charge back cloud costs to the various business units.
AWS CloudTrail is an audit log of almost all the events that take place inside your AWS environment. You can consume the events in a number of different fashions but the most popular is to push them to an S3 bucket for inspection.
With CloudTrail organizations can look at user & resource activity allow for deeper analysis and troubleshooting, particularly when encountering issues with an AWS environment.
AWS CloudWatch can be used to monitor a number of services and applications that run inside an organizations AWS environment. CloudWatch gives organizations oversight of the whole cloud environment with feedback around resource utilization, application performance and operational health.
CloudWatch allows for the monitoring of EC2, DynamoDB, EBS, RDS and many more service. You can also monitor custom metrics for applications via API. One of the most popular use cases is for metrics to trigger alarms which can then be used to kick off other automated actions.
AWS Config is a service that enables you to assess, audit, and evaluate the configurations of your AWS resources. Config continuously monitors and records your AWS resource configurations and allows you to automate the evaluation of recorded configurations against desired configurations.
With Config, you can review changes in configurations and relationships between AWS resources, dive into detailed resource configuration histories, and determine your overall compliance against the configurations specified in your internal guidelines. This enables you to simplify compliance auditing, security analysis, change management, and operational troubleshooting.
As a logging tool, Trusted Advisor works best when combined with CloudWatch. CloudWatch can be used to report on changes in Trusted Advisor.
Then you can use these CloudWatch alarms to send alarms and notifications or even to trigger changes that remediate your environment.
Vendor: Beeva [Open Source]
Manage your Trusted Advisor Alarms, AWS Health notifications and AWS Support cases all in one place with Open Source Code.
Datadog is an enterprise grade monitoring and analytics solution that ingests events from across the full stack of an organization. From there, users can log, analyze and display their events to ascertain meaningful insights about their application performance.
Datadog can also be used to parse and filter critical events to create alerts that notify and / or trigger automation processes for remedy or similar.
Thousands of organizations use GorillaStack’s free CloudTrail Listener for slack to filter CloudTrail and receive notifications around specific events that are useful to the end user.
Valuable metadata is attached to each event and notifications can be allocated on a user by user basis to make sure critical events only go to the applicable Slack user.
ScienceLogic monitors multiple platforms and provides tools to create alerts for specific combinations of events that are of importance. Organizations can escalate and distribute alerts based on their requirements.
Splunk collects data to provide application and performance analytics. Splunk gives businesses the tools to improve uptime, identify technical issues and detect security breaches.
Splunk’s graphical tools enable straightforward consumption and visualization of data logs in a single pane.
Vendor: Sumo Logic
Sumo Logic is a cloud-native, machine data analytics service for log management and time series metrics. A SaaS product, it enables IT teams to monitor and and analyze logs across AWS which can be used to promote security & compliance as well as best practice incident management & troubleshooting.
Until recently, only available for EC2 instances, you can now use Auto Scaling to automatically adjust capacity to meet the demands put on your services.
You can confidently maintain performance by setting minimum, maximum and desired volume of services to meet spikes and troughs in usage.
You can use AWS Budgets to send an SNS topic when a cost threshold is exceeded in your AWS account. From there, the SNS topic can be leveraged to trigger automation flows that remedy the issue (i.e. a flow that turns off or scales down a service).
AWS Config allows businesses to trigger SNS topics when resource configurations deviate from the policies set out by an organization. Again, these SNS topics can be used to send alerts or trigger flows that remediate any such deviation.
Vendor: Instance Scheduler
Instance Scheduler is a CloudFormation template that can be used to schedule instances to turn on and off. Users have to set up a stack and the scheduler requires a fair bit of configuration and maintenance to get working and manage ongoing.
AWS Systems Manager is a tool for automating tasks on EC2 instances such as patches, updates and configurations changes. It has some built-in presets but can be heavily customized by experienced end-users.
Vendor: Answers 4 AWS [Open Source]
Backup Monkey is a simple Open Source python script that’s takes backups of EBS volumes.
Vendor: Capital One [Open Source]
Cloud Custodian is an Open Source project from Capital One. Users can create policies around encryption and access and tagging. Most importantly Cloud Custodian automates cost optimization by deleting unused resource and power cycling.
BidElastic uses machine learning to predict spikes in workload and scale your compute instances out before CPU spikes cause service disruption. Interestingly it also automates bidding and setting up of instances to meet your needs resulting in time savings and eliminating over provisioning.
Cloudability is a tool primarily focused towards the needs of the finance team. That being said they offer some light automation to resolve some of the recommendations made by their cost analysis tool.
As with other vendors in the visibility space, CloudHealth a handful of ways to react to the data that the product aggregates. You can automate RI purchases and cycle instances based on CloudHealth’s recommendations.
Cloudwiry provides a library of lambda scripts that you can use to automate AWS cost savings. As is common in this space, those scripts can be scheduled from a calendar type interface.
Using machine learning and artificial intelligence, Densify identifies the exact provisioning and optimal AWS resource to run your environment in the most cost effective way. Densify then deploys those resources on your behalf – monitoring & adjusting in reaction to any changes in the AWS environment.
DivvyCloud lets organizations set policies to put guardrails around AWS usage to cover security, compliance and cost governance.
Though CloudRanger was designed as a tool for back up automation, they evolved to include cost optimization in their repertoire. CloudRanger can be used to delete storage that has expired as well as to cycle instances and on and off when they’re not in use. As an extra bonus, CloudRanger can be used for Disaster Recovery – bunkering snapshots into extra regions and then launching them.
FittedCloud leverages cloud usage data to to identify underutilized resources and optimize them in real-time using machine learning. Users can also set policies to govern FittedCloud’s dynamic resource optimization for extra oversight and peace of mind.
GorillaStack is a highly flexible and customizable rules engine that organizations can use to define inputs and trigger outputs that result in cost optimization among other things. For example, input a schedule or an SNS topic to then turn off or scale down instances based on the triggering action.
Out of the box automation configurations include automated backups and snapshots, turning EC2 and RDS instances on and off, trigger lambdas from SNS topics, multi-region backup for disaster recovery and much more. All integrated with ChatOps and with granular permissioning so that big teams can be enabled to manage their own infrastructure.
ActOnMagic can be used to automate the fixing of resource leakages, to rightsize and to power cycle instances. They have an internal Auto Scaling product whereby a user can set performance thresholds to govern provisioning and optimize performance.
Beam analyzes and report on underutilized and unused services and provides automation that empowers DevOps managers to fix them to realize immediate cost benefits.
ParkMyCloud does exactly what it sounds like. Set a schedule to park your cloud instances when they’re not in use or use their SmartParking to set up cycle on and off according to historical usage patterns.
Use RightScale’s products to turn off or remove unused or idle resources. Optimize RI purchases and cycle instances on and off when they’re not in use by scheduling down times.
You can use Skeddly to schedule and trigger all sorts of changes inside your cloud. Create and delete backups as well as power cycling EC2 and RDS instances.
Turbonomic is a platform that uses Machine Learning to investigate an organizations cloud environment and generates cost optimizations benefits through making scaling and cycling adjustments. Turbonomic also works to maximize Reserved Instance consumption in combination with rightsizing.
Amazon EC2 Fleet allows AWS customers to launch a fleet of instances spanning On-Demand, Reserved and Spot Instances using 1 API call. The user can specify the capacity and instance type with confidence that AWS will manage the launch, maintenance and monitoring post launch. This automation is perfect for elasticity and scaling.
Similar to EC2 Fleet, Spot Fleet will launch Spot Instances and can also launch On-Demand Instances. Spot Instances are launched when the bid price exceeds the market price and maintained even in a fluid market for as long as the bid price exceeds the market price.
Spot Instance Advisor is a feedback tool that gives you recommendations about which types of instances to bid for in which regions for optimal savings but also, and rather usefully, with minimal downtime.
BidElastic and BidServer uses advanced Machine Learning to define the best achievable price in a fluid Spot Instance market and then to automate the purchasing and maintenance of Spot Instances.
Spotinst was one of the first solutions to automate the bidding, purchasing and maintenance of Spot Instances while guaranteeing zero downtime. The product redresses capacity and moves to on-demand in anticipation of interruption and moves your compute to Reserved Instances or On-Demand temporarily until Spot capacity is available again.
Information in this blog are opinions of individual authors and not the views of any organization or employer. Whilst every endeavor has been made to ensure that the information in this product is current, GorillaStack and AWS do not warrant the accuracy or completeness of information in this blog and any person using or relying upon such information does so on the basis that GorillaStack and AWS shall bear no responsibility or liability whatsoever for any errors, faults, defects or omissions.