Amazon Glacier is a low-cost archival solution from Amazon Web Services which has been optimised to provide durable storage for data archiving and backup. Data can be stored cost effectively for a longer duration. In addition, Glacier is highly scalable and as such customers need not worry about capacity planning or hardware provisioning and AWS takes care of this.
- Exam Important Note – Glacier is an ideal solution where customers want to archive data for long duration but where retrieval times for the Standard Retrieval option can range between 3 to 5 hours or more. Two additional retrieval options have been recently introduced as follows:
- Expedited retrievals are available within 1 – 5 minutes, allowing you to quickly access your data when occasional urgent requests for a subset of archives are required.
- Bulk retrievals are Glacier’s lowest-cost retrieval option, enabling you to retrieve large amounts, even petabytes, of data inexpensively in a day. Bulk retrievals typically complete within 5 – 12 hours.
- Important Note – Initial configuration can be carried out using the Amazon Glacier Management Console, which you can use to create and delete vaults. However, interactions with Amazon Glacier require that you use the AWS Command Line Interface (CLI) or applications. Several third party applications are available to interact with your Glacier account. To upload or download data you must either use the AWS CLI or write code to make requests, by using either the REST API directly or by using the AWS SDKs
Data is stored as “archives” on Amazon Glacier. This is any file or aggregation of multiple files into a single zip/tar file for upload as a single archive. Each archive can be a maximum of 40TB and you can store unlimited number of archives
Similar to S3 ‘Buckets’, Amazon Glacier uses the concept of ‘Vaults‘ to store your data archives. You can choose to configure your vaults using the AWS Management Console as well as the AWS SDKs to perform actions such as:
- Create Vault
- Delete Vault
- Local Vault
- List Vault Metadata
- Retrieve Vault Inventory
- Configure Notification
- Tag Vaults for filtering
In addition, you can set various access policies to grant or deny what users and groups can do with your vaults.
Key features available for Amazon Glacier
- Vault Inventory enables you to retrieve an inventory of all your data stored in vaults and this inventory gets updated once a day. Requests can be made as a JSON or CSV file and contains details on your archives like size, date of creation etc
- Access Control is provided through integration with AWS Identity and Access Management. Here you can create users and assign them access rights to any vaults stored in your Glacier account. You can also grant permissions on an individual or group basis.
- Vault Access Policies enables you to define access policies on a vault based on users and groups
- Vault Lock is a feature where you can enforce compliance locks of individual vaults. For example, you can use the “Write Once Read Many” Vault Lock Policy which will ensure that once data is added to the specified vault, it can not be overwritten
- Three Data Retrieval Policies
- Free Tier Only allows data to be retrieved within the free tier only and is free of cost
- Max Retrieval Rate enables you to set the maximum retrieval rate as GB per hour and costs around $7.20
- No Retrieval limit enables you to retrieve any amount of data and the retrieval costs are based on usage
- Audit Logs enable you to log all API calls made to Amazon Glacier for your account. You can review which users have accessed which vaults or identify who created or deleted a vault
- Integrated Lifecycle Management with Amazon S3 – Glacier is integrated with the S3 lifecycle management process and as described in the Exam Tips for S3 Part 2, you can reduce your overall costs by migrating data to be stored in Glacier once a specific time frame has been reached
- Tagging Support enables you to tag vaults with labels that you can use to create filters for AWS billing and cost reports. It helps in analysing cost structures and can be used for other reporting requirements such as usage by company, department or team
- AWS Software Development Kits (SDKs) are used for upload and retrieval of data in Glacier. Minimal interaction is available via the console but programmatic access is necessary for more complex tasks to be performed with the Glacier solutions. The SDKs are available for Java and.Net
- AWS Import/Export can be used to assist when trying to transfer large amounts of data into and out of AWS Glacier. AWS Import/Export offers faster speeds than the Internet and is more cost effective than upgrading start ISP-provided bandwidth. The solution involves using portable storage devices for physical transport between client and Amazon’s data centres.
- AWS Direct Connect is another option available which provides high bandwidth connectivity using dedicated network connections from clients’ premises to AWS. Speeds of between 1Gbps and 10Gbps are possible through AWS Direct Connect
Data stored in Amazon Glacier is protected such that only the vault owner has access to the resources they create. In addition, data is encrypted use AWS 256-bit encryption at rest and supports secure transmission using SSL. It is also possible to protect data using Identity and Access Management (IAM) policies. Data is also immutable which means once an archive is created it can be deleted but it cannot be updated.
Data durability and reliability
Like S3, data stored in archives offers 99.999999999% durability with data being stored in multiple facilities on multiple devices. Data corruption is prevented with checksum analysis, systematic integrity check and is built to be self-healing.
- You can upload, download or delete archives
- Each archive can range between 1 byte and 40TB in size
- Each archive has a unique ID
- Data can be downloaded from Glacier but it takes between 3 to 5 hours for Standard Retrieval Option. This is because Amazon Glacier must prepare the archive for download. Also once the archive is ready, you have 24 hours in which to download the data from staging
- New Retrieval Options added include Expedited retrievals, which are available within 1 – 5 minutes and Bulk retrievals which are for retrieving large amounts, even petabytes, of data inexpensively in a day. This can take 5 to 12 hours.
- Use data retrieval policies to manage costs and set data retrieval limits
- You can download to a device in your organisation, an Amazon EC2 instance or copy it to an S3 bucket
- Amazon SNS can be used to send out notifications when jobs complete
- You can delete an archive at any time. Archives that are deleted within 3 months of being uploaded will be charged a deletion fee
- Vaults must be empty before you can delete them
180 Practice Exam Questions – Get Prepared for your Exam Day!
Our Exam Simulator with 180 practice exam questions comes with comprehensive explanations that will help you prepare for one of the most sought-after IT Certifications of the year. Register Today and start preparing for your AWS Certification.