Database systems come in two types of categories namely Relational Databases (RDBMS) and NoSQL (Non-Relational Databases). Amazon AWS offers the following types of Databases and Database Services:
- RDS – Managed Relational Databases Service
- Dynamo DB – Managed NoSQL Database Service
- ElastiCache – In-Memory Cache
- Redshift – Data Warehousing Serving
- DMS – Manage Database Migration Service
The most common type of databases in use is the Relational Database System. They provide a structure to store data in tables which consist of rows and columns similar to a spreadsheet. Columns contain attributes of a record which are inserted in rows and you can have multiple columns. In addition, each attribute is of a particular data type; text, number, date, time etc.
Each record in a table has a primary key to uniquely identify the record. For example, an Employee ID can be used as a primary key to uniquely identify employees in the database. The primary keys of each record from one table can then be used to reference primary keys of other tables to avoid duplication of data.
Relational Databases can be either:
- Online Transactional Processing (OLTP) Database Systems – These are transaction-oriented applications that consistently writes to and reads from the database, e.g. use a MySQL database to run eCommerce driven sites. With OLTP you pull information based on say an order number or pull information stored on a customer by querying Customer ID
- Online Analytical Processing (OLAP) Database Systems – These are for reporting and analysing large datasets. OLAP enables users to analyse multidimensional data interactively from multiple perspectives. These are:
- consolidation (roll-up)
- slicing and dicing
- OLAP systems are mostly optimised for reading, OLTP has to processes all kinds of queries (read, insert, update and delete
Point-of-sales data and sales made via your website for your mobile phone which you sell across Asia and US. You may be asked to analyse net profit over the last 12 months and you would need to take into account several data sets to compile the report. For example:
- You need sales data of all phones sold in India
- You need sales data of all phones sold in the US
- Unit Cost of manufacturing the phones in your factory in China
- Unit Cost of bringing the phones into India and US
- Sales price of each phone in each region – take into account currency and exchange rates
- Sales price – unit cost to give you profit
Often your Online Transaction Processing database needs to be kept separate from your Online Analytical Processing to ensure that you don’t overload your database with queries.
This is a repository of data that is often used for reporting and analysis via OLAP. Data Warehouses are used to compile data and run complex queries. Data Warehouses are updated in batches at regular schedules whereas OLTP can be updated every second. Amazon RDS is primarily used to OLTP workloads; whereas Amazon Redshift is a high-performance data warehouse designed for OLAP purposes.
Used for business intelligence with tools like SQL Server Reporting Services or Oracle Hyperion, and used to conduct large queries comparing current trends with targets etc.
Amazon Offers 6 Relational Database Types:
- SQL Server
- MySQL Server
Document-oriented databases that are non-relational. A Database essentially is broken down as follows:
- Collection (Table)
- Document (Row)
- Key Value Pairs (Fields)
Otherwise known as NoSQL Databases, these are highly used in recent years as they the NoSQL database architecture makes it easy to scale horizontally far more easily and support clustering across multiple machines for performance and fault tolerance. Furthermore, the structure enables for the schema of the database to evolve as and when required easily.
NoSQL Databases can be run on EC2 instances or services like Amazon DynamoDB is Amazon’s managed NoSQL Database solutions.
Amazon ElastiCache is a web service that makes it easy to deploy, operate, and scale an in-memory data store or cache in the cloud. The service improves the performance of web applications by allowing you to retrieve information from fast, managed, in-memory data stores, instead of relying entirely on slower disk-based databases. Amazon ElastiCache supports two open-source in-memory engines:
- Redis – open source in-memory data store and cache. Redis is suitable for demanding applications and can scale from single-node up to 15 shared clusters which can provide up to 3.55Tib of in-memory data. Used for high-performance use cases like mobile apps and gaming. Use Redis when you need Sorted Lists, Counters & Hashes a well as Backup/Restore capabilities
- Memcached – memory object caching system, used when you have simple data type model and you need to run large nodes with multiple cores or threads. You can also use Memcached when you do not require High Availability via replication
Database Migration Service
AWS Database Migration Service (AWS DMS) can migrate your data to and from most widely used commercial and open-source databases such as Oracle, PostgreSQL, Microsoft SQL Server, Amazon Redshift, Amazon Aurora, MariaDB, and MySQL.
The service supports homogeneous migrations such as Oracle to Oracle, and also heterogeneous migrations between different database platforms, such as Oracle to MySQL or MySQL to Amazon Aurora. The source or target database must be on an AWS service.
AWS DMS connects to the source database, reads the source data, formats the data for consumption by the target database, and loads the data into the target database. Most of this processing happens in memory, though large transactions might require some buffering to disk. Cached transactions and log files are also written to disk.
Amazon RDS is a managed database service that offers easy setup, operations, scalability and security of a relational database system on the AWS platform. Amazon RDS offers 6 database engines at your disposal and each come with their own features making them appropriate for varying business use cases.
Amazon RDS enables you to replicate your data to increase availability, increase durability as well as scale up and out from a single instance.
A DB instance is an isolated database environment in the cloud. Amazon RDS provides a fully managed service to launch your DB Instances with varying parameters. Common DB Instance operations using the API include ‘CreateDBInstance’ to launch new DB Instance and ‘ModifyDBInstance’ to change and resize the DB Instance.
Different DB Instance classes are offered which enables you to select the class that matches your Compute and Memory requirements. These start with db.t2.micro with 1vCPU and 1GB RAM to db.r3.8xlarge with 32 vCPU and 244GB RAM. DB Instances can range from 5 GB to 6 TB of associated storage capacity.
Amazon RDS features and configuration options can be managed using the:
- DB Parameter Groups – Container to configure values that need to be applied to one or more DB instances of the same type. A default parameter group is applied if none is specified. These contain specific defaults for the DB Engine and Instance Class.
Note: The DB Parameter Group can be changed for an instance, but a reboot is required.
- DB Options Groups – specific engine features are made available such as the option to enable SQL mirroring. To use these features, you need to configure and option group and configure the settings required
Amazon RDS support six database engines. These are:
- MySQL – Amazon RDS supports DB instances running several versions of MySQL. You can resize the DB Instance. You can also create and restore backups and snapshots, create Multi-AZ secondaries and Read Replicas.
- PostgreSQL – Amazon RDS supports DB instances running several versions of PostgreSQL. Amazon RDS supports snapshots and backup and recovery. You can also use Multi-AZ deployments and Read Replicas. You can create them inside a VPC and use SSL to connect to the instance. Standard SQL client tools like pgAdmin or P-SQL command line utility can be used to run commands manage the instance.
- MariaDB – Amazon RDS supports DB instances running version 10.0 or 10.1 of MariaDB. Again, Multi-AZ secondaries, creating Read Replicas, and monitoring the performance of the DB instance can be carried out. You use standard MariaDB utilities and applications to store and access the data in the DB instance.
- Oracle – Amazon RDS supports DB instances running several versions and editions of Oracle Database. You can enable additional features to your Oracle DB instance. Amazon RDS currently supports Multi-AZ deployments for Oracle as a high-availability, failover solution. You can access the schema using SQL Client Apps like Oracle SQL Plus.
- MS SQL – Amazon RDS support DB instances running several versions of MS SQL. You can point-in-time restores, and automated or manual backups. You can run your SQL Server inside a VPC, use TDE to encrypt data at rest and configure Multi-AZ deployments for SQL Server using SQL Server Mirroring as a high-availability, failover solution. No Shell access is provided but you can use Microsoft SQL Server Management Studio to access the DB instance. No direct host access to a DB instance via Telnet, Secure Shell (SSH), or Windows Remote Desktop Connection.
- Amazon Aurora – Amazon Aurora is an enterprise-grade commercial database that is MySQL compatible and is used when you need greater levels of reliability and performance. Creating an Aurora DB Instances actually requires you to create a cluster which can have one or more DB Instances and Cluster volume to manage data. Cluster volumes span multiple Availability Zones; each AZ has a copy of the cluster data
Two types instances:
- Primary Instances – Main Instance supporting read and write workloads; any modification of data happens at the primary instance. Each cluster has one primary instance
- Amazon Aurora Replica – Secondary Instance supports only read operations. Each DB Cluster can have 15 Aurora Replicas; enable the distribution of read-workloads among the instances and increase performance. Replicas can be placed across Availability Zones to increase availability
Amazon RDS provides administration for Amazon Aurora by handling routine database tasks such as provisioning, patching, backup, recovery, failure detection, and repair. Amazon RDS also provides push-button migration tools to convert your existing Amazon RDS for MySQL applications to Amazon Aurora.
Amazon RDS primarily uses Amazon Elastic Block Store (Amazon EBS) volumes for databases and log storage. The exception is Amazon Aurora which uses Amazon’s proprietary storage. With Amazon RDS, you can scale from 4Gb to 6TB in provisioned storage of up to 30,000 IOPS.
Amazon RDS provides three storage types:
- Magnetic – also known as standard storage, this is ideal for light I/O requirement
- General Purpose SSD – known as gp2. it offers cost-effective storage that is ideal for a variety of workloads. These volumes deliver single-digit millisecond latencies and can burst up to 3,000 IOPS for extended periods of time. A gp2 volume can range in size from 1 GB to 16 TB. Volumes can range in size from 5 GB to 6 TB for MySQL, MariaDB, PostgreSQL, and Oracle DB instances, and from 20 GB to 4 TB for SQL Server DB instances
- Provisioned IOPS – designed to meet I/O intensive workloads. Provisioned IOPS volumes can range 100 GB to 6 TB for MySQL, MariaDB, PostgreSQL, and Oracle DB engines. SQL Server Express and Web editions can range in size from 100 GB to 4 TB. SQL Server Standard and Enterprise editions can range in size from 200 GB to 4 TB. With Provisioned IOPS, you get to specify the amount of storage you wish to allocate and the dedicated IOPS you want.
Backup & Recovery
Amazon RDS provides backup and recovery features to ensure you can easily restore from failures and provide highly available deployments for (Disaster Recovery) DR and (Business Continuity Purposes) BCP purposes.
Critical to designing any enterprise solution to deliver services that meet business Recovery Point Objects (RPO) and Recovery Time Objective (RTO)
- RPO is the maximum period of data loss.
- RTO is the maximum downtime that is permitted to recover from backup
Amazon RDS offers an automatic backup feature to backup you complete DB instance. Amazon RDS creates a storage volume snapshot of your DB instance, backing up the entire DB instance and not just individual databases.
- Backup retention period can range from 1 and 35 days – Note: An outage occurs if you change the backup retention period from 0 to a non-zero value or from a non-zero value to 0
- Default backup window is set to 30 minutes
- During the automatic backup window, storage I/O might be suspended briefly while the backup process initializes
- To disable automated backups for a DB instance, set the backup retention parameter to 0; e.g. using the API – BackupRetentionPeriod= 0
- Similarly, you can enable automated backups for a DB instance by setting the backup retention period to a positive non-zero value
- Automated backups are enabled by default and you get free storage equal to the size of your database.
You can perform manual DB snapshots and create these as many times as you want. You can use the AWS Console or CreateDBSnapshot action. Manual backups are not automatically deleted by AWS and you have to explicitly delete them using either the console or the DeleteDBSnapshot action. If you need to regularly take large snapshots and backups; consider setting up your Database to use Multi-AZ – When taking snapshots for Multi-AZ deployments, the backup is taken from the standby and this reduces the duration of I/O suspension.
You can restore point in time backups of your databases to help recover your database. When you restore, the default DB security group is applied to the new DB instance. If you need custom DB security groups applied to your DB instance, you must apply them explicitly. Note: You can restore to any point in time during your backup retention period.
Note that when you restore a snapshot, you cannot restore to the existing DB instance; you will have to restore to a new instance.
Also Note The OFFLINE, EMERGENCY, and SINGLE_USER mode is not currently supported. Setting any database into one of these modes will cause the latest restorable time to stop moving ahead for the whole instance.
High Availability and Multi-AZ
With Amazon RDS Multi-AZ deployments, you are able to create database clusters across multiple availability zones. Amazon’s Multi-AZ option increases the availability of your databases using replication. You can, therefore, achieve better RPO and RTO for your business needs, with fast failover to slave copies of your databases.
With Multi-AZ, Amazon RDS automatically creates and maintains a synchronous standby replica in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to a standby replica to provide data redundancy, eliminate I/O freezes, and minimise latency spikes during system backups.
Databases end-points are referenced by their DNS name which means that in the event of a failover, Amazon will change the CNAME record to point to the standby server and so minimise downtime. In addition, the primary DB instance switches over automatically to the standby in the event of an Availability Zone outage. Manual failover can also be carried out if required and failover takes a few minutes only.
NOTE: Multi-AZ Deployments are for disaster recovery purposes only and not for enhancing database performance. Standby DB Instances are not available to offline queries from primary master DB. If you wish to use multiple DB Instances, you can use read replicas for Amazon ElastiCache.
With Databases, Amazon offers standard options to Scale Up or Down. This is known as vertical scaling and involves modifying the instance type from a smaller instance to a larger one as required.
Key points to note when scaling up or down:
- Storage and instance type are decoupled. When you scale your database instance up or down, your storage size remains the same and is not affected by the change. You can separately modify your DB instance to increase the allocated storage space or improve the performance by changing the storage type
- There is minimal downtime when you are scaling up on a Multi-AZ environment because the standby database gets upgraded first, then a failover will occur to the newly sized database. A Single-AZ instance will be unavailable during the scale operation
- You can apply the change immediately or schedule it
Amazon also offers Horizontal Scaling options for some of its Databases Engines. You can scale database by having read replicas; RDS MySQL, PostgreSQL, and MariaDB can have up to 5 read replicas, and Amazon Aurora can have up to 15 read replicas.
Read replicas are essentially read-only copies that are synchronised with your master database. You can place your read replica in a different AWS Region closer to your users for better performance. Also, you can use read replicas to increase the availability of your database by promoting a read replica to a master for faster recovery in the event of a disaster. However, read replicas are not a replacement for the high availability and automatic failover capabilities that Multi-AZ provides.
Read Replica Use Cases:
- Offload reporting or data warehousing tasks against replicas instead of the primary DB
- Handle read traffic while source DB is in maintenance mode
- Scale beyond the capacity of a single DB instance where you have heavy workloads
- Read Replicas are used for Scaling and not DR
- You must have automatic backups turned on for Read-Replicas to work
- You can have read replicas or read replicas but can result in latency
- Each Read Replicas have their own DNS endpoint
- Read Replicas are not Multi-AZ enabled
- You can create Read Replica of Multi-AZ DBs
- Read Replica can be promoted to its own database
Cross Region Read Replicas – Amazon supports cross region read replicas for some of its database engines. Typical use cases include:
- Disaster Recovery: You can create cross-region read replicas of your primary database instance to have a disaster recovery solution. If your primary region faces a disruption, you can promote the replica to a master and keep your business operational.
- Scaling: You can use cross-region read replicas to support read queries from your workloads across various geographic locations. This will reduce latency by serving your customers from a database that is close to them.
- Cross-region Migration: If you would like to migrate your database instance quickly to another AWS region, you may do so by using cross-region replication.
You can utilise various security features of the Amazon RDS service as well as incorporate other Amazon Security features to provide a complete end to end solution that will meet all your core security requirements. Typically, you can:
- Run DB Instances in a VPC and specifically in a private subnet of a VPC
- Use IAM policies to assign permissions on who is allowed to manage the RDS resource
- Use Security Groups to control what IP Address or EC2 Instances can connect to the DB Instance. When setting up an instance, you need to specify rules via associated security groups
- Use SSL with DB Instances running MySQL, Aurora, MariaDB, PostgreSQL, Oracle and MS SQL for encryption during transfer of data
- Use RDS encryption to secure your RDS instances and snapshots at rest
- Use network encryption and transparent data encryption with Oracle DB instances
- Use the security features of your DB engine to control who can log into the databases on a DB instance
- Encryption at rest is available using the Amazon Key Management Service
- Encrypting an existing Database is not supported; you have to create a new DB Instance with encryption enabled and migrate data to it