AWS S3 Glacier Cheat Sheet (The Ultimate Backup Layer)

S3 Glacier is an AWS storage service intended strictly as a last-resort backup, only if everything else fails.

Files stored in this service are rarely accessed and retrieval comes with a cost. In other words, do not expect to get them back easily.

If you do need to retrieve your files, be prepared to wait hours or even days before they are available for download.

One important tip: avoid sending small files. Always zip them into large archives, but make sure to split them into pieces smaller than 4 GB. Doing this upfront will save you from a painful multipart upload process (not recommended).

See the official documentation for CLI commands at [Link].

CREATING A VAULT

Vaults in S3 Glacier are the equivalent of Buckets in S3.

aws glacier create-vault --vault-name my-glacier --account-id -

UPLOADING AN ARCHIVE SMALLER THAN 4 GB

aws glacier upload-archive --account-id - --vault-name my-glacier --body backup.zip

UPLOADING AN ARCHIVE LARGER THAN 4 GB

Uploading files over 4 GB is strongly discouraged because it requires multipart upload.

Instead, use zip, tar, split, or any similar tool to break the file into smaller chunks, then use the upload-archive command shown above.

If multipart upload is unavoidable, start by initiating it and specifying the part size in bytes:

aws glacier initiate-multipart-upload --account-id - --part-size 4194304 --vault-name my-glacier --archive-description "big-file.zip"

This will output an Upload-ID required for all subsequent commands:

aws glacier upload-multipart-part --body part-1 --range 'bytes 0-4194303/*' --account-id - --vault-name my-glacier --upload-id **upload-id**
aws glacier upload-multipart-part --body part-2 --range 'bytes 4194304-8388607/*' --account-id - --vault-name my-glacier --upload-id **upload-id**
aws glacier upload-multipart-part --body part-3 --range 'bytes 8388608-12582911/*' --account-id - --vault-name my-glacier --upload-id **upload-id**
...This process must be scripted as it may need to run hundreds or thousands of times...

LISTING VAULT CONTENT

Submit a job request to get the Job-ID:

aws glacier initiate-job --account-id - --vault-name my-glacier --job-parameters '{"Type": "inventory-retrieval"}'

NOTE: This may take hours or even days. Keep checking whether the status is “In Progress” or “Succeeded”.

aws glacier describe-job --vault-name my-glacier --account-id - --job-id tuGTrZxAIwLtvcj6Sv6IU3eoEnuIUX18QeME5x7ENl38UctRykke-jJ9PKQ1YsyVACnkQXd2HLCiWppOcTU2NgUdKjc5

Once succeeded, download the job output (a JSON file):

aws glacier get-job-output --vault-name my-glacier --account-id - --job-id tuGTrZxAIwLtvcj6Sv6IU3eoEnuIUX18QeME5x7ENl38UctRykke-jJ9PKQ1YsyVACnkQXd2HLCiWppOcTU2NgUdKjc5 output.json
cat output.json

The JSON output, once formatted, will look like this:

{
   "VaultARN":"arn:aws:glacier:ca-central-1:398812248970:vaults/my-glacier",
   "InventoryDate":"2022-02-26T20:39:15Z",
   "ArchiveList":[
      {
         "ArchiveId":"2qbH3Hgyq0rdnu4-xZQNiNLh9lkODN1orUMD-dJkNuAC6YQASPHAAH8LaEijaoEcYaRGNCjrH1u-zlqHOzZoRUCHr-JOWiqg_PsLpuzSDsb48SaKEtvBYUGZ_tY0jN19OhBlTkM_tA",
         "ArchiveDescription":"",
         "CreationDate":"2022-02-26T17:22:53Z",
         "Size":153717283,
         "SHA256TreeHash":"62d22c320d76356dd9aa6032bdd1bca43c3f1aadef085cd7f5f327e6f9e2b004"
      },
      {
         "ArchiveId":"lhIYAD3uqPnweEtP4yQII_BhZjy4JBOaprFlmsqe0gJ08V5ccOwdQ3eSxc1NXjXm2BqxQlidiDiaCSgOtuwJQvQpZ-Z0qw7tXRM0QHKUntPYeaowJJlmbvQK4d-fIEMTNjRsVJwKUw",
         "ArchiveDescription":"",
         "CreationDate":"2022-02-26T17:26:12Z",
         "Size":646743001,
         "SHA256TreeHash":"70d5a5c0323dc9fc43edacee92f4a88cda5ac7369751dddc246a8b09e619ee74"
      },
      {
         "ArchiveId":"TniZ4K9rvxujzzfXI8Mb5Ioz8G_OCHgr0G4LmJHHM9hijmIxSEeGMfaoPW_n3XvHMZsh9gK9rvp04Vkf8Pk7paVHwEKylfSAz7I7YUvpZ7VC7hmGlJ0ZB69EO9rc3RJSuJipABzfRg",
         "ArchiveDescription":"",
         "CreationDate":"2022-02-26T17:28:08Z",
         "Size":592047837,
         "SHA256TreeHash":"d3ba9031a1212c495bedbf05c944bf5707011650141421fa6b50234866bec4d3"
      }
   ]
}

NOTE: The highlighted values are the Archive-IDs of the uploaded files. These IDs are required to retrieve or delete archives. Be aware that each get-job-output link expires after 24 hours.

DELETING AN ARCHIVE

aws glacier delete-archive --vault-name my-glacier --account-id - --archive-id 2qbH3Hgyq0rdnu4-xZQNiNLh9lkODN1orUMD-dJkNuAC6YQASPHAAH8LaEijaoEcYaRGNCjrH1u-zlqHOzZoRUCHr-JOWiqg_PsLpuzSDsb48SaKEtvBYUGZ_tY0jN19OhBlTkM_tA

DOWNLOADING AN ARCHIVE

First, submit a job request to get the Job-ID:

aws glacier initiate-job --account-id - --vault-name my-glacier --job-parameters '{"Type": "archive-retrieval"}'

Once the job succeeds:

aws glacier get-job-output --account-id - --vault-name my-glacier --job-id tuGTrZxAIwLtvcj6Sv6IU3eoEnuIUX18QeME5x7ENl38UctRykke-jJ9PKQ1YsyVACnkQXd2HLCiWppOcTU2NgUdKjc5 backup.zip

REFLECTIONS

This AWS service is not user-friendly and I would only recommend it for personal use in extreme cases, as it requires significant scripting and automation.

Retrieval time and cost are manageable if you never plan on accessing the archives, but the retrieval process itself is painful.

Whenever possible, consider using a regular S3 bucket for the same purpose.