Query your CloudTrail like a pro with Athena

Elliott Spira | Mon, 19 Aug 2019

Blog Feature Graphic

Have you ever wondered how many failed ConsoleLogin attempts you had last month and which regions or IP addresses they originated from? Do you want to know who has been creating all those AWS Accounts in your Organization, or who created a specific EC2 instance that is completely untagged? (See Auto Tag and Retro Tag).

The good news is that there is a magical data source with answers to all these questions and more - CloudTrail. The bad news is that it is huge and tricky to search without the right guidance and tools.

In this blog post and a series of companion posts we will explore how to search your CloudTrail data quickly and efficiently, and what types of insights you can glean from this data source.

What is Athena?

Athena is a fully managed, query service that doesn't require you to configure any servers. It allows you to search your unstructured data in S3 using SQL and pay per query. We specify our CloudTrail S3 bucket and, as you will see below, our different partition keys and we can start to search our CloudTrail data efficiently and inexpensively.

Challenges with Querying your CloudTrail data

Running queries over S3 data and paying per query sounds great, but if the data set is large and queries require the loading of most/all data, your queries can take a fail with timeouts and your Athena bill can start to inflate.

Looking at the contents of your CloudTrail buckets, you will see that the paths of each log file contain the following:

  1. account
  2. region
  3. year
  4. month
  5. day

If we're trying to query our CloudTrail data for data from the last 90 days for a subset of accounts or just a particular region, it doesn't make sense to load all CloudTrail data or all years, accounts and regions.

That's why partitioning your database is important and why using partition keys in your queries will help you keep costs low and queries fast.

Typically, to start adding partitions to your table, you would need to run queries like this across all the permutations of partition keys in your environment:

ALTER TABLE cloudtrail\_logs 
ADD PARTITION (account='<account id>', region='<region>', year='<year>', month='<month>', day='<day>') 
LOCATION 's3://<cloudtrail s3 bucket name>/AWSLogs/<org id>/<account id>/CloudTrail/<region>/<year>/<month>/<day>/';

As you can imagine, the number of accounts, regions, years, months and days within an organization's CloudTrail bucket can make this process cumbersome and time consuming. As time continues, you will likely want to add even more partitions for other accounts, regions, years, months and days.

This sounds like a perfect fit for an AWS Glue Crawler. Glue Crawlers explore the data available in S3, modify the schema based on the shape of the data discovered and add partitions based on partition keys discovered. However, after trialling this approach, we found that it was much slower (and therefore more expensive) than we expected. Furthermore, a bug in the Glue Crawlers modified the schema, breaking some of the column name constraints that are required by Hive.

What we needed was essentially a quick and easy mechanism for periodic partition creation.

The solution

Because of all of these challenges, we created the open-source athena-cloudtrail-partitioner to help you:

  1. Create the table with the right schema, metadata and configuration options
  2. Periodically create partitions based on the account/region/year/month/day keys available in the CloudTrail S3 bucket
  3. Maintain state on which partitions have been created. This means that you can spend less time manually creating thousands of partitions and start querying your CloudTrail data, which is what you're really after anyway

Follow the instructions in the README to get started.

Running some queries

We're just going to touch on a few questions in this post. We'll explore a range of other interesting questions in a series of later posts.

Which identities are doing the most AssumeRole requests. You might be surprised by the data.

Which IAM identities are the top successful users of IAM:AssumeRole in the last 30 days?

SELECT useridentity.sessioncontext.sessionissuer.arn, count(useridentity.sessioncontext.sessionissuer.arn) as count
FROM cloudtrail\_logs
WHERE year = '<YEAR>'
        AND account = '<ACCOUNT>'
        AND eventname = 'AssumeRole'
        AND errorcode is null
        AND from\_iso8601\_timestamp(eventtime) > date\_add('day', -30, now())
GROUP BY useridentity.sessioncontext.sessionissuer.arn;

Which AssumeRole attempts are failing?

Who is attempting to assume the role and how many times has it failed? This could reveal broken automation, broken privileges or something much worse.

How many AssumeRole attempts have failed by User Identity in the last fortnight?

SELECT useridentity.sessioncontext.sessionissuer.arn, count(useridentity.sessioncontext.sessionissuer.arn) as count 
FROM cloudtrail\_logs 
WHERE year = '<YEAR>' 
     AND account = '<ACCOUNT>' -- want this to run quicker? Just specify the month(s) too (as this is part of the partition) 
     AND eventname = 'AssumeRole' 
     AND errorcode = 'AccessDenied' 
     AND from\_iso8601\_timestamp(eventtime) > date\_add('day', -14, now()) 
GROUP BY useridentity.sessioncontext.sessionissuer.arn;

Has the root user been used in the last 90 days?

It is best practise not to use the Root user on your AWS Account. It is a good idea to periodically query as to whether or not anyone has used this user in the last 90 days. If there are results, go have a conversation with your teams to get an understanding of why it happened and whether or not it can be prevented in the future.

SELECT DISTINCT COUNT(\*) FROM cloudtrail\_logs WHERE year = '<YEAR>' 
     AND account = '<ACCOUNT>' 
     AND useridentity.type = 'Root' 
     AND useridentity.invokedby != 'support.amazonaws.com' 
     AND from\_iso8601\_timestamp(eventtime) > date\_add('day', -90, now());

There are a number of interesting insights that you can garner from your CloudTrail logs. We'll be posting a number of blog posts highlighting different queries and use cases for your Athena queries for CloudTrail.

Reach out to us on our public slack channel to let us know what you think

Tags AthenaCloudTrailGlueReal Time EventsBack To All Posts
Real-Time-Alerts

REAL TIME EVENTS

Observe. Report. Act.
Cost-Optimization

COST OPTIMIZATION

Immediate cost reduction
Backup-Disaster-Recovery

BACKUP & DISASTER RECOVERY

Script-free peace of mind