Ever see untagged AWS Resources and think to yourself "who created this? It looks like I could delete it, but I don't want to affect anyone who is relying on it". Unfortunately, these untagged resources can build up building up like plaque in an artery, inflating your bill, using up bandwidth in your service limits and priming you for a cost heart-attack.
While our Auto Tag repository deals with this problem from the time of its installation onwards, there was a gap in functionality for all resources created prior to its installation.
The good news is that the data your need in this scenario was logged by AWS CloudTrail into your CloudTrail's S3 bucket. The bad news is that finding the specific events pertaining to each resource's creation in this bucket of giga to petabytes woth of compressed json is not so straightforward.
One of our community members, Ray Janoka, made inroads on solving this problem by implementing a utility that uses AWS Athena to consume CloudTrail logs from S3 to find out which user created that resource and tag it with their ARN. It's awesome to see the community get involved (even if they do choose to use Ruby :P). Out of Ray’s work was born Retro Tag.
Query Cloudtrail using AWS Athena
What is really elegant here is the process of using AWS Athena to query the structured data in S3. However, if you have lots of CloudTrail data, you will likely reach limits with this approach, because of the sheer magnitude of the data set being loaded in Athena.
This is where table partitioning plays a crucial role, allowing us to segment by the account, region, year, month and day keys that are available in the path of the CloudTrail data in S3. Partitioning has positive implications on query times and costs.
This is a topic of its own, and one we'll be covering it in a series of upcoming blog posts. Stay tuned and in the meantime check out Retro Tag!