S3 Glacier is a block storage service of AWS that basically is only applicable to the “ultimate last backup resort” if everything else fails.
The files sent to this service are very infrequent accessible and there is cost associated with their retrieval. In other words, never expect to get them back.
In case you need to retrieve your files be prepared to wait hours or even days to have them available for download.
Another important tip is not to send small files. Always zip them in big chunks (archives) but make sure to break the archives into pieces smaller than 4 GB. Doing this while zipping will save you from a very painful multipart upload process (not recommended).
See official documentation for CLI commands at [Link].
CREATING A VAULT
Vaults for S3 Glacier are equivalent to Buckets for S3.
aws glacier create-vault --vault-name my-glacier --account-id -
UPLOADING AN ARCHIVE SMALLER THAN 4 GB
aws glacier upload-archive --acount-id - --vault-name my-glacier --body backup.zip
UPLOADING AN ARCHIVE LARGER THAN 4 GB
I strongly recommend not uploading files over 4 GB because it requires using a feature called multipart upload.
Instead, use zip
, tar
, split
, or any other tool to split the big file into small chunks then use the upload-archive
shown above.
Otherwise, it will require initiating a multipart upload informing the size of each part in bytes:
aws glacier initiate-multipart-upload --account-id - --part-size 4194304 --vault-name my-glacier --archive-description "big-file.zip"
It will output an Upload-ID that will be used in all the following commands:
aws glacier upload-multipart-part --body part-1 --range 'bytes 0-4194303/*' --account-id - --vault-name my-glacier --upload-id **upload-id**
aws glacier upload-multipart-part --body part-2 --range 'bytes 4194304-8388607/*' --account-id - --vault-name my-glacier --upload-id **upload-id**
aws glacier upload-multipart-part --body part-3 --range 'bytes 8388608-12582911/*' --account-id - --vault-name my-glacier --upload-id **upload-id**
...This process has to be scripted because it might require hundreds or thousands of times...
LISTING VAULT CONTENT
Make a job request and get the Job-ID:
aws glacier initiate-job --account-id - --vault-name my-glacier --job-parameters '{"Type": "inventory-retrieval"}'
NOTE: this might take yours or even days, so keep checking if the status is “In Progress” or “Succeeded”.
aws glacier describe-job --vault-name my-glacier --account-id - --job-id tuGTrZxAIwLtvcj6Sv6IU3eoEnuIUX18QeME5x7ENl38UctRykke-jJ9PKQ1YsyVACnkQXd2HLCiWppOcTU2NgUdKjc5
When succeeded, download the content of the requested job (will be a JSON format file):
aws glacier get-job-output --vault-name my-glacier --account-id - --job-id tuGTrZxAIwLtvcj6Sv6IU3eoEnuIUX18QeME5x7ENl38UctRykke-jJ9PKQ1YsyVACnkQXd2HLCiWppOcTU2NgUdKjc5 output.json cat output.json
The JSON format will have to be beautified to look like this:
{ "VaultARN":"arn:aws:glacier:ca-central-1:398812248970:vaults/my-glacier", "InventoryDate":"2022-02-26T20:39:15Z", "ArchiveList":[ { "ArchiveId":"2qbH3Hgyq0rdnu4-xZQNiNLh9lkODN1orUMD-dJkNuAC6YQASPHAAH8LaEijaoEcYaRGNCjrH1u-zlqHOzZoRUCHr-JOWiqg_PsLpuzSDsb48SaKEtvBYUGZ_tY0jN19OhBlTkM_tA", "ArchiveDescription":"", "CreationDate":"2022-02-26T17:22:53Z", "Size":153717283, "SHA256TreeHash":"62d22c320d76356dd9aa6032bdd1bca43c3f1aadef085cd7f5f327e6f9e2b004" }, { "ArchiveId":"lhIYAD3uqPnweEtP4yQII_BhZjy4JBOaprFlmsqe0gJ08V5ccOwdQ3eSxc1NXjXm2BqxQlidiDiaCSgOtuwJQvQpZ-Z0qw7tXRM0QHKUntPYeaowJJlmbvQK4d-fIEMTNjRsVJwKUw", "ArchiveDescription":"", "CreationDate":"2022-02-26T17:26:12Z", "Size":646743001, "SHA256TreeHash":"70d5a5c0323dc9fc43edacee92f4a88cda5ac7369751dddc246a8b09e619ee74" }, { "ArchiveId":"TniZ4K9rvxujzzfXI8Mb5Ioz8G_OCHgr0G4LmJHHM9hijmIxSEeGMfaoPW_n3XvHMZsh9gK9rvp04Vkf8Pk7paVHwEKylfSAz7I7YUvpZ7VC7hmGlJ0ZB69EO9rc3RJSuJipABzfRg", "ArchiveDescription":"", "CreationDate":"2022-02-26T17:28:08Z", "Size":592047837, "SHA256TreeHash":"d3ba9031a1212c495bedbf05c944bf5707011650141421fa6b50234866bec4d3" } ] }
NOTE: highlighted are the Archive-IDs of the uploaded files (archives). This ID is needed to request to retrieve or delete them. Be aware that each get-job-output
will be expired in 24 hours.
DELETING AN ARCHIVE
aws glacier delete-archive --vault-name my-glacier --account-id - --archive-id 2qbH3Hgyq0rdnu4-xZQNiNLh9lkODN1orUMD-dJkNuAC6YQASPHAAH8LaEijaoEcYaRGNCjrH1u-zlqHOzZoRUCHr-JOWiqg_PsLpuzSDsb48SaKEtvBYUGZ_tY0jN19OhBlTkM_tA
DOWNLOADING AN ARCHIVE
First, make the job request and get the Job-ID:
aws glacier initiate-job --account-id - --vault-name my-glacier --job-parameters '{"Type": "archive-retrieval"}'
Whenever the job request is succeeded:
aws glacier get-job-output --account-id - --vault-name my-glacier --job-id tuGTrZxAIwLtvcj6Sv6IU3eoEnuIUX18QeME5x7ENl38UctRykke-jJ9PKQ1YsyVACnkQXd2HLCiWppOcTU2NgUdKjc5 backup.zip
REFLECTIONS
This AWS service is not really very human-friendly and I would only recommend it for personal use in extreme cases because it really requires a lot of scripting and automation.
The retrieval time and cost is not really one issue considering it is not part of the plan needing ever to have access to the archives but the process of retrieving them is also painful.
My suggestion is to consider using a regular S3 bucket for the same purpose whenever possible.