Backing up EBS Volumes

Create and delete EBS snapshots with Lambda and CloudWatch

EBS volumes

If you’ve been on the clouds lately, especially AWS, you certainly know that attaching EBS (Elastic Block Store) volumes to EC2 instances is part of almost any AWS set up. In fact, EBS volumes are so foundational that some AWS services, such as RDS (Relational Database Service), rely on them for storage.1 Creating and attaching those volumes is just half of the story, though; to guarantee a peaceful, undisturbed night of sleep, you should also back them up periodically. In this article, I’ll show you how to do it with the help of AWS Lambda and, if you want to run it on a schedule, CloudWatch. Everything will be written in AWS CloudFormation, but it shouldn’t be too hard to translate to Terraform.

When you should use this solution

You should use this solution in any of the following scenarios:

  • you want to easily back up EBS volumes manually (push-button style);
  • your backup process needs to follow strict conditions and should be customizable;
  • you want to automate snapshot deletions as well.

If all you need is a dumb backup scheduler, then the one described in the AWS documentation will do just fine.

Project structure

.
├── cloudwatch.yaml
├── iam.yaml
└── lambda.yaml
  • iam.yaml: defines an IAM (Identity and Access Management) role and its inline IAM policy, so that the Lambda functions can access other AWS services (EC2 resources, CloudWatch logs etc.);
  • lambda.yaml: defines the Lambda functions responsible for creating and deleting EBS snapshots;
  • cloudwatch.yaml: defines the CloudWatch permissions and event rules, so that the Lambda functions can be triggered on a schedule. This file is not required if you want to trigger the Lambda functions manually.

How it works

The Lambda function responsible for creating the snapshots will loop through all the AWS regions specified in the lambda.yaml template and check if there are volumes available for backup (flagged with the tag Backup and value Yes). If volumes are found that match that criteria, a new snapshot is created for each of them.

Another Lambda function, responsible for the deletion part, will also loop through the regions to find what it’s looking for—EBS snapshots older than a certain amount of days, specified in the same template. When those are found, they get permanently deleted. Yep, I said permanently. You can choose not to use that Lambda function if you prefer deleting the backups yourself.

Those two Lambda function can be automatically triggered on a schedule by CloudWatch events.

“Shut up and take me to the code!”

Before we start, note that, because the CloudFormation stacks depend on each other, it’s important that you keep their names consistent. Every template file will come with its corresponding stack name in a comment at the top of the file.

IAM template

Let’s start with the iam.yaml template:

---
# Stack name: ebs-snapshots-iam
AWSTemplateFormatVersion: "2010-09-09"
Description: >
  Configure IAM role and role policy so that the Lambda service can
  create and delete EBS snapshots.
Resources:
  LambdaRole:
    Type: "AWS::IAM::Role"
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          -
            Effect: "Allow"
            Principal:
              Service:
                - "lambda.amazonaws.com"
            Action:
              - "sts:AssumeRole"
      Path: "/"
      Policies:
        -
          PolicyName: "ebs-snapshots-lambda-role-policy"
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              -
                Effect: "Allow"
                Action:
                  - "ec2:CreateTags"
                  - "ec2:CreateSnapshot"
                  - "ec2:DeleteSnapshot"
                  - "ec2:Describe*"
                  - "logs:*"
                  - "ec2:ModifySnapshotAttribute"
                  - "ec2:ResetSnapshotAttribute"
                  - "xray:PutTelemetryRecords"
                  - "xray:PutTraceSegments"
                Resource: "*"

Outputs:
  LambdaRoleArn:
    Description: >
      ARN of IAM role that allows the Lambda service to create
      and delete EBS snapshots.
    Value: !GetAtt "LambdaRole.Arn"
    Export:
      Name: !Sub "ebs-snapshots-lambda-role-arn"

When executed, this template creates an IAM role with an inline policy (ebs-snapshots-lambda-role-policy). The role can later be impersonated by one or more Lambda functions, which end up inheriting the permissions defined in the inline policy. If it wasn’t for this IAM role, the Lambda functions wouldn’t be able to perform actions such as listing existing EBS snapshots, creating a new EBS snapshot or even generating log messages.

To create a CloudFormation stack from that template, cd into the directory where you saved it and use the following command line:

aws cloudformation create-stack \
--stack-name ebs-snapshots-iam \
--template-body file://iam.yaml \
--capabilities CAPABILITY_IAM

Pay attention to the stack name (ebs-snapshots-iam); it will be important later.2 If you query the IAM API now, you should see a new role with an inline policy (AssumeRolePolicyDocument) attached to it:3

$ aws iam list-roles \
--query 'Roles[?starts_with(RoleName, `ebs-snapshots-iam`) == `true`]'
[
    {
        "CreateDate": "2018-09-18T17:45:26Z",
        "Arn": "arn:aws:iam::056792237289:role/ebs-snapshots-iam-LambdaRole-8I1K53CZI1PJB",
        "RoleId": "BIRAQAWHJE7O4VBCTTBAZ",
        "RoleName": "ebs-snapshots-iam-LambdaRole-8I1K53CZI1PJB",
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Action": "sts:AssumeRole",
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "lambda.amazonaws.com"
                    }
                }
            ]
        },
        "MaxSessionDuration": 3600,
        "Path": "/"
    }
]

Lambda template

AWS lets us define Lambda functions directly in the CloudFormation templates. I find this approach quite handy as opposed to uploading my scripts to S3 and referencing them from the CloudFormation template, because that would require setting up the appropriate IAM permissions on the S3 bucket as well. The raw scripts are explained below and you can use them to place stand alone files in an S3 bucket, in case you choose to go the harder way.

Create snapshot

This is the Python 3 code I’m using to create the EBS snapshots. We’ll see where this script goes in a bit; for now, just try to understand what it does.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import os
import boto3

def handler(event, context):
    ec2_client = boto3.client("ec2")

    # AWS regions where EBS volumes are located
    env_regions = os.getenv("REGIONS", None)
    if env_regions:
        regions = ec2_client.describe_regions(RegionNames=env_regions.split(","))
    else
        regions = ec2_client.describe_regions()

    total_created = 0
    for region in regions.get("Regions", []):
        print("Checking for EBS volumes in region %s" % (region["RegionName"]))

        # Get all volumes in this region, according to status and tag(s)
        ec2_client = boto3.client("ec2", region_name=region["RegionName"])
        result = ec2_client.describe_volumes(Filters=[
            {"Name": "status", "Values": ["in-use"]},
            {"Name": "tag:Backup", "Values": ["Yes"]}
        ])

        if not result:
            print("No EBS volumes in this region")
            continue

        for volume in result["Volumes"]:
            # Get volume's tags
            volume_name = volume["VolumeId"]
            volume_description = "Created by ebs-snapshots-lambda-create()"
            if "Tags" in volume:
                for tag in volume["Tags"]:
                    if tag["Key"] == "Name":
                        volume_name = tag["Value"]
                    elif tag["Key"] == "Description":
                        volume_description = tag["Value"]

            print("Creating snapshot for EBS volume")
            print("ID = %s. Name = %s." % (volume["VolumeId"], volume_name))

            result = ec2_client.create_snapshot(
                VolumeId=volume['VolumeId'],
                Description=volume_description
            )

            # Tag snapshot
            ec2_resource = boto3.resource('ec2', region_name=region["RegionName"])
            snapshot_resource = ec2_resource.Snapshot(result['SnapshotId'])
            snapshot_resource.create_tags(Tags=[
                {'Key': 'Name', 'Value': volume_name},
                {'Key': 'Description', 'Value': volume_description}
            ])

            total_created += 1

    return total_created

Before you say my code is disgusting and spit on your screen, bear in mind that there are space constraints I need to take into account when adding code snippets to a blog article. Also, all the print statements are there for a reason: they get logged in CloudWatch. When creating backups through a script, especially in an automated fashion, you want to be extra careful and get as much feedback as you can from your code while it’s being executed.

The first chunk of code deals with the AWS regions where the EBS snapshots are supposed to be located. If the REGIONS environment variable is set, then describe_regions() will only return AWS regions matching that variable. Note that REGIONS should contain a CSV (comma separated value); we’ll see an example of that later. Otherwise, if REGIONS is not defined, the script will loop through all possible regions, which is most probably a waste of time—I’ve never seen a use case that required EBS volumes to be stored on every single AWS region.

While looping through the regions, the script will get all EBS volumes inside each of them, but only if the volumes comply with two conditions: that they’re currently in use (status = in-use) and that they’re marked as eligible for a backup (with the Backup tag). The first one requires no effort on your side—as long as the volume is being referenced by, for example, an EC2 instance, AWS will mark it as in-use. The second one will require manual intervention, though, because the Backup tag is a custom tag. So, for the script to take a certain volume into account, that volume needs to contain the value Yes for the custom Backup tag. You can do that manually on each of your EBS volumes, but nothing stops you from creating a separate script that tags all of them with the value Yes. (I won’t cover that here, though.)

You might as well just remove those conditions altogether and back up all your freaking EBS volumes, goddamnit:

# This will work too
result = ec2_client.describe_volumes()

It’s really up to you; my script is just one of multiple possible implementations.

Either way, for each of the matched EBS volumes, the script will first get the volume’s name and description. Those will be used when tagging the new snapshot, making it easier to identify the original volume relative to a given backup. Because we want those two variables (volume_name and volume_description) to always be populated, we initialize them with default values. For example, if you have an EBS volume with a tag Name containing the value secondary-partition and a tag Description containing the value HTTP directory, then those tags (and respective values) will be used on the equivalent snapshot. Otherwise, if any of those tags is not defined, then the corresponding default value is used.

After that, create_snapshot() is called. Note that I’m purposefully avoiding try except blocks all along the script—if anything goes wrong, I want it to implode in the ugliest way possible.

At its final step, the script uses the volume_name and volume_description variables to tag the snapshot that was just created. It also returns the amount of backups successfully generated, which will let us confirm that it worked without us having to peek into the logs.

Delete snapshot

If you think that using a script to delete stale snapshots is crazy talk and rather do it manually, feel free to skip this part. Otherwise, here it goes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import os
from datetime import datetime, timedelta
import boto3

def handler(event, context):
    ec2_client = boto3.client("ec2")

    # AWS accounts
    env_owner_ids = os.getenv("ACCOUNT_NUMBERS", None)
    owner_ids = env_owner_ids.split(",") if env_owner_ids else None
    if not owner_ids:
        raise Exception("At least one AWS account number needs to be specified")

    # Only snapshots older than RETENTION_DAYS will be deleted
    env_retention_days = int(os.getenv("RETENTION_DAYS", 7))
    cutoff_time = datetime.utcnow() - timedelta(days=env_retention_days)

    # AWS regions where EBS snapshots are located
    env_regions = os.getenv("REGIONS", None)
    if env_regions:
        regions = ec2_client.describe_regions(RegionNames=env_regions.split(","))
    else
        regions = ec2_client.describe_regions()

    total_deleted = 0

    for region in regions.get("Regions", []):
        print("Checking for EBS snapshots in region %s" % (region["RegionName"]))

        # Get all snapshots in this region, for these AWS accounts
        ec2_client = boto3.client("ec2", region_name=region["RegionName"])
        result = ec2_client.describe_snapshots(OwnerIds=owner_ids)

        if not result:
            print("No EBS snapshots in this region")
            continue

        for snapshot in result["Snapshots"]:
            date_format = "%Y-%m-%dT%H:%M:%S.%fZ"
            cutoff_time_formatted = cutoff_time.strftime(date_format)
            snapshot_time_formatted = snapshot["StartTime"].strftime(date_format)

            if cutoff_time_formatted > snapshot_time_formatted:
                # Get snapshot's tags
                snapshot_name = ""
                if "Tags" in snapshot:
                    for tag in snapshot["Tags"]:
                        if tag["Key"] == "Name":
                            snapshot_name = tag["Value"]

                snapshot_id = snapshot["SnapshotId"]

                print("Deleting EBS snapshot")
                print("ID = %s. Name = %s." % (snapshot_id, snapshot_name))

                ec2_client.delete_snapshot(SnapshotId=snapshot_id)

                total_deleted += 1

    return total_deleted

Just like the previous one, this script is despicable! (Says the high and mighty pythonista.) To be fair, this one looks even worse, because it references even more environment variables. Sorry. I had to.

Let me start with the easiest one: REGIONS is doing the very same thing it did in the previous script. Then, there’s a RETENTION_DAYS environment variable, which is used to determine whether a snapshot is eligible for deletion or not. Its default value is 7, meaning that only EBS snapshots older than seven days will be eligible for deletion. The third and last one is ACCOUNT_NUMBERS (comma separated value), which is only used because the Boto 3 library requires “owner IDs” when calling the describe_snapshots() function.

According to the AWS API documentation, the describe_snapshots() function lets you list all snapshots available to you, which

include public snapshots available for any AWS account to launch, private snapshots that you own, and private snapshots owned by another AWS account but for which you’ve been given explicit create volume permissions.

That’s a lot of snapshots. You probably only want to deal with the snapshots under your own AWS accounts—I’ll show you how to do that in a bit. All you need to know for now is that, without the ACCOUNT_NUMBERS variable, we’d risk listing way more EBS snapshots than the ones we really care about.

As for the script’s functionality, this one also starts with a main loop through the AWS regions matching the REGIONS variable. For each of them, it tries to get us only the snapshots we’re interested in (those belonging to ACCOUNT_NUMBERS). Then, for each snapshot found, the script checks if the cutoff time (calculated from RETENTION_DAYS) has been reached. In case it does, it prints a message telling us what’s being deleted, and then performs the deletion. The script ends by returning the total number of successful deletions.

Now, where do you think these Python scripts go? (Don’t be rude!)

Back to the Lambda template

As I said before, AWS lets us dump our Python code directly into a CloudFormation template, which I’m more than happy to do. Instead of doing that, though, I’ll show you the Lambda template without those scripts, because it will help you visualize the other pieces that are necessary to make it work, such as the integration with the IAM role defined earlier and the environment variables.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
# Stack name: ebs-snapshots-lambda
AWSTemplateFormatVersion: "2010-09-09"
Description: "Configure Lambda functions to create and delete EBS snapshots"
Resources:
  LambdaCreate:
    Type: "AWS::Lambda::Function"
    Properties:
      FunctionName: "ebs-snapshots-lambda-create"
      Description: "Create EBS snapshots"
      Handler: "index.handler"
      Role: !ImportValue "ebs-snapshots-lambda-role-arn"
      Environment:
        Variables:
          REGIONS: !Ref "AWS::Region"
      Code:
        ZipFile: |
            # First script (snapshot creation) goes here
      Runtime: "python3.6"
      Timeout: "90"
      TracingConfig:
        Mode: "Active"

  LambdaDelete:
    Type: "AWS::Lambda::Function"
    Properties:
      FunctionName: "ebs-snapshots-lambda-delete"
      Description: "Delete EBS snapshots"
      Handler: "index.handler"
      Role: !ImportValue "ebs-snapshots-lambda-role-arn"
      Environment:
        Variables:
          REGIONS: !Ref "AWS::Region"
          RETENTION_DAYS: 7
          ACCOUNT_NUMBERS: !Ref "AWS::AccountId"
      Code:
        ZipFile: |
            # Second script (snapshot deletion) goes here
      Runtime: "python3.6"
      Timeout: "90"
      TracingConfig:
        Mode: "Active"

Outputs:
  LambdaCreateArn:
    Description: "ARN of Lambda function responsible for creating EBS snapshots"
    Value: !GetAtt "LambdaCreate.Arn"
    Export:
      Name: !Sub "ebs-snapshots-lambda-create-arn"
  LambdaDeleteArn:
    Description: "ARN of Lambda function responsible for deleting EBS snapshots"
    Value: !GetAtt "LambdaDelete.Arn"
    Export:
      Name: !Sub "ebs-snapshots-lambda-delete-arn"

Among other things, the template specifies what language the Lambda functions are written in, the maximum amount of time they should run for, their names, and descriptions. You can find more information about the other properties in the AWS documentation. What I really want you to focus on is the Role property, which is assigned a value imported from our IAM template (ebs-snapshots-lambda-role-arn), and the Environment property, which defines the environment variables accessed from within the Python scripts.

As you can see, REGIONS is assigned a single value (!Ref "AWS::Region"), which references the current AWS region—the AWS region where the CloudFormation stack will be located. Other possibilities are:

  • ap-south-1
  • sa-east-1,ca-central-1
  • !Ref "AWS::Region",ap-northeast-1

The other two environment variables are only required if you also want to use the snapshot deletion Lambda function. As a matter of fact, if you don’t plan on using that Lambda function, you can go ahead and remove the two sections belonging to it (LambdaDelete and LambdaDeleteArn).

Otherwise, ACCOUNT_NUMBERS is set to the current account (the AWS account responsible for the CloudFormation stack’s creation); you can use other values as well:

  • 210987654321
  • 123456789012,210987654321
  • 555026907218,!Ref AWS::AccountId

Finally, RETENTION_DAYS is set to the number seven, but you can use any other integer.

Now that you have a good understanding of the template’s structure, dump the Python scripts in the places highlighted (under Code: ZipFile:). Ensure the indentation is consistent with the rest of the YAML document, otherwise CloudFormation will complain.

Once that’s done, you can save the file (lambda.yaml) somewhere in your file system and run the create-stack sub command:

$ aws cloudformation create-stack \
--stack-name ebs-snapshots-lambda \
--template-body file://lambda.yaml

You should have two new Lambda functions on your AWS account now: ebs-snapshots-lambda-create and ebs-snapshots-lambda-delete. Like any other Lambda function, they can be run from the AWS web console or the command line. Before we run those functions, though, let’s first check if there are any volumes eligible for a backup in our account:

$ aws ec2 describe-volumes \
--filters Name="status",Values="in-use" Name="tag:Backup",Values="Yes" \
--query "Volumes[*].{ID:VolumeId,State:State,Tag:Tags}"
[
    {
        "ID": "vol-02b8a95986fc70546",
        "State": "in-use",
        "Tag": [
            {
                "Key": "Name",
                "Value": "Voluminous"
            },
            {
                "Key": "Backup",
                "Value": "Yes"
            }
        ]
    },
    {
        "ID": "vol-4f2012673f70b6af9",
        "State": "in-use",
        "Tag": [
            {
                "Key": "Name",
                "Value": "Apache"
            },
            {
                "Key": "Backup",
                "Value": "Yes"
            }
        ]
    }
]

Perfect; the only two volumes I care about are in use and marked for backup. Now, let me check how many snapshots there are on my account:

$ aws ec2 describe-snapshots --owner-id 056792237289
{
    "Snapshots": []
}

No snapshots. Hmmm, I should take care of that. In order to create backups for my two volumes, I need to invoke the ebs-snapshots-lambda-create function. I could do that from the command line, but this time around I’ll log into the AWS console and do the work from there. First, note how the CloudFormation template succeeded in creating a default value for the REGIONS environment variable:

/img/2018/09/2018-09-25-backing-up-ebs-volumes/01-lambda-environment-variables.jpg

Lambda environment variables (create snapshots).

When I click on Test and then Create, I’m taken through a form that asks me to create an event before I can test my function. It doesn’t really matter how the event is defined, so I’ll just leave it as is, with its default payload.

/img/2018/09/2018-09-25-backing-up-ebs-volumes/02-test-event-thumb.jpg

Just a dummy test event.

Once the test event is created, I can click on Test again and finally trigger the function. This is what I get as a result:

/img/2018/09/2018-09-25-backing-up-ebs-volumes/03-lambda-result-and-log-output-thumb.jpg

Lambda result and log output (create snapshots).

In the Details tab, right above the summary, I can see that my script returned the number two, which suggests that two volumes were successfully backed up. To confirm that everything went according to the script (literally), I’ll query the API to fetch all snapshots associated with my account again:

$ aws ec2 describe-snapshots --owner-id 056792237289 \
--query "Snapshots[*].{ID:SnapshotId,Description:Description,Tag:Tags}"
[
    {
        "Description": "Created by ebs-snapshots-lambda-create()",
        "ID": "snap-610804feaea5f5e42",
        "Tag": [
            {
                "Key": "Name",
                "Value": "Apache"
            },
            {
                "Key": "Description",
                "Value": "Created by ebs-snapshots-lambda-create()"
            }
        ]
    },
    {
        "Description": "Created by ebs-snapshots-lambda-create()",
        "ID": "snap-a21ae10846fe969fc",
        "Tag": [
            {
                "Key": "Description",
                "Value": "Created by ebs-snapshots-lambda-create()"
            },
            {
                "Key": "Name",
                "Value": "Voluminous"
            }
        ]
    }
]

Excellent! The snapshots were created and the description and tags were correctly assigned as well. It may be a good time to test the deletion script now. Let’s have a look at the default environment variables generated by CloudFormation for the deletion function:

/img/2018/09/2018-09-25-backing-up-ebs-volumes/04-lambda-environment-variables.jpg

Lambda environment variables (delete snapshots).

All good. After running the deletion function with a dummy test event, though, I get the following result:

/img/2018/09/2018-09-25-backing-up-ebs-volumes/05-lambda-result-and-log-output-thumb.jpg

Lambda result and log output (delete snapshots).

Zero deletions, but no errors in the log. WTF?!? It’s because none of the snapshots I created in the previous step has already hit the threshold of seven days. That means our script is respecting the cutoff time passed through the environment variable, which is good. So, to test that the Lambda function is working properly, we’ll need to wait seven days. See you next week…

Just kidding. The easiest way to test the function is by modifying the RETENTION_DAYS environment variable directly on the Lambda web console. You shouldn’t manually modify AWS resources created through CloudFormation like that, but this is an exception—we’re just testing. Once I change that value to 0 and rerun the Lambda function, I get the result I was expecting: two snapshots deleted. To confirm it, I just need to run the describe-snapshots sub command again:

$ aws ec2 describe-snapshots --owner-id 056792237289
{
    "Snapshots": []
}

Everything works wonderfully so far: we have two Lambda functions that can be used to create and delete EBS snapshots with the click of a button on the AWS console. How easier can that get? With automation, of course.

CloudWatch template

If all you want is create and delete backups manually, you’re good to go. Bye.

Otherwise, this section will show you how the two Lambda functions can be automatically triggered on a schedule. Look ma, no hands!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
# Stack name: ebs-snapshots-cloudwatch
AWSTemplateFormatVersion: "2010-09-09"
Description: >
  Configure CloudWatch events to trigger Lambda functions responsible for 
  managing EBS snapshots.
Resources:
  RuleCreate:
    Type: "AWS::Events::Rule"
    Properties:
      Description: "Trigger a lambda function that creates EBS snapshots"
      ScheduleExpression: "cron(30 23 * * ? *)" # Run at 11:30 PM (UTC) every night
      State: "ENABLED"
      Targets:
        -
          Arn: !ImportValue "ebs-snapshots-lambda-create-arn"
          Id: "ebs-snapshots-lambda-create-target"
  PermissionCreate:
    Type: "AWS::Lambda::Permission"
    Properties:
      FunctionName: !ImportValue "ebs-snapshots-lambda-create-arn"
      Action: "lambda:InvokeFunction"
      Principal: "events.amazonaws.com"
      SourceArn: !GetAtt ["RuleCreate", "Arn"]

  RuleDelete:
    Type: "AWS::Events::Rule"
    Properties:
      Description: "Trigger a lambda function that deletes EBS snapshots"
      ScheduleExpression: "cron(45 23 * * ? *)" # Run at 11:45 PM (UTC) every night
      State: "ENABLED"
      Targets:
        -
          Arn: !ImportValue "ebs-snapshots-lambda-delete-arn"
          Id: "ebs-snapshots-lambda-delete-target"
  PermissionDelete:
    Type: "AWS::Lambda::Permission"
    Properties:
      FunctionName: !ImportValue "ebs-snapshots-lambda-delete-arn"
      Action: "lambda:InvokeFunction"
      Principal: "events.amazonaws.com"
      SourceArn: !GetAtt ["RuleDelete", "Arn"]

PermissionCreate and PermissionDelete are Lambda permissions that allow CloudWatch to invoke specific Lambda functions, specified in FunctionName. RuleCreate and RuleDelete define the CloudWatch events that will be triggered on a schedule, like a Cron job. The schedule expressions I used in the template, by the way, follow the Cron syntax.

Because the deletion script only touches old backups, it doesn’t make much of a difference if it runs before or after the creation script. Feel free to modify the Cron expressions to fit your needs, but bear in mind that “All scheduled events use UTC time zone and the minimum precision for schedules is 1 minute.”4

To create the CloudFormation stack, save the template as cloudwatch.yaml and run the create-stack sub command:

$ aws cloudformation create-stack \
--stack-name ebs-snapshots-cloudwatch \
--template-body file://cloudwatch.yaml

This is the result I get after the CloudFormation stack gets created, causing a new CloudWatch event to be scheduled:

/img/2018/09/2018-09-25-backing-up-ebs-volumes/06-cloudwatch-event-thumb.jpg

CloudWatch event (create snapshots).

To test the scheduler, you can open the CloudWatch web console and tweak the Cron expression directly from there (Actions >> Edit, on the top right)—this way you don’t need to wait until 11:30 PM (UTC) for the Lambda function to get called.

According to our CloudFormation template, the deletion event should be triggered after 15 minutes, resulting in the invocation of the deletion function. As you may remember, I changed the value of RETENTION_DAYS directly in the Lambda web console to contain the value 0. The effect of that change is that all EBS snapshots in my account (and my default region) will be removed as soon as the function is invoked. It’s worth emphasizing that this will only happen because the Lambda function’s environment variable (RETENTION_DAYS) was set to 0, which I only did for testing purposes. Under normal circumstances, you probably shouldn’t be deleting your snapshots that often.

Delete the stacks

In case you want to delete all of the AWS resources you’ve created while going through this article, just call delete-stack following the correct dependency order:

aws cloudformation delete-stack --stack-name ebs-snapshots-cloudwatch
aws cloudformation delete-stack --stack-name ebs-snapshots-lambda
aws cloudformation delete-stack --stack-name ebs-snapshots-iam

Don’t worry: your EBS snapshots and volumes will remain intact.

Conclusion

Phew, that took longer than I thought! But you’ve made it. Now you can back up your EBS volumes manually, from the Lambda web console, or automatically, through CloudWatch scheduled events. You can also delete stale snapshots the same way. How awesome is that?


  1. See DB instance storage. [return]
  2. If you’re not familiar with the --capabilities parameter, you can find more information about it in the CloudFormation API documentation. [return]
  3. All queries (--query) in this article are written in JMESPath. [return]
  4. See Schedule Expressions for Rules. [return]