AWS Sysops Administrator – Monitoring RDS
A primary task for an AWS Sysops Administrator is to monitor the core services on the platform. Amazon RDS is one such service that needs to be managed and monitored. Amazon recommends that you create a monitoring plan to enable clarity in what exactly needs to be monitored for your organisational needs. Two types of monitoring available in Amazon RDS:
- Use CloudWatch to monitor RDS Metrics
- Use Events to monitor in RDS itself
CloudWatch
When using CloudWatch, you can monitor RDS metrics by
- Per-Database Metrics
- By Database Class
- By Database Engine
- Across All Database
Events
When using Events in RDS, you get a breakdown of various events that occurred to your RDS instance. You can also create Event Subscriptions for example when you want to get notified of a failover event, where one database instance failed over to another.
In addition to the automated monitoring tools available, you can also perform manual monitoring not covered by CloudWatch. These include:
- The RDS Console
- Number of connections to DB Instance
- Read and Write Operations for a DB Instance
- Storage Consumed by DB Instance
- Memory and CPU Utilisation
- Network Traffic to and from the DB
- AWS Trusted Advisor
- Review Cost Optimisation, security and fault tolerance. Conduct performance check including
- RDS Idle DB Instances
- RDS Security Group Access Risk
- RDS Backups
- RDS Multi-AZ
- Aurora DB Instance Accessibility
- Review Cost Optimisation, security and fault tolerance. Conduct performance check including
- CloudWatch Homepage
- Current Alarms and Status
- Graphs of Alarms
- Service Health Status
Amazon RDS Metrics
The following a various metrics you should be aware of:
- BinLogDiskUsagege – Amount of disk space occupied by binary logs on the master – This relates to MySQL read replicas
- CPUUtilization – Percentage of CPU utilisation as a %
- CPUCreditUsage – Number of CPU credits consumed by instances
- CPUCreditBalance – The number of CPU credits available for the instance to burst beyond the base CPU utilisation. Credits expire 24 hours after they are earned
- DatabaseConnections – The number of database connections in use
- DiskQUeueDepth – The number of outstanding IOs waiting to access the disk
- FreeableMemroy – The amount of available RAM
- FreeStorageSPace – The amount of available storage space
- ReplicaLag – The amount of time a Read Replica DB instance lags behind the source DB.
- Swap Usage – The amount of swap space used on the DB instance
- ReadIOPS – The average number of disk I/O operations per second
- WriteIOPS – the average number of disk I/O operations per second
- ReadLatency – The average time taken per disk I/O per second
- Write Latency – The average time taken per disk I/O per second
- ReadThroughput – The average number of bytes read from disk per second
- WriteThroughput – The average number of bytes written to disk per second
- NetworkReceieveThroughput – The incoming network traffic on the DB Instance for both consumer and monitoring and replication
- NetworkTransmitThroughput – The outgoing network traffic on the DB Instance for both consumer and monitoring and replication
Enhanced Monitoring
In addition to standard CloudWatch metrics, you can use Enhanced Monitoring. Enhanced Monitoring is available for the following engines:
- MariaDB
- Amazon Aurora
- Microsoft SQL Server
- MySQL version 5.5 of above
- Oracle
- PostgreSQL
Enhanced monitoring is not available for db.t1.micro and db.m1.small instance types.
Important Note – Enhanced Monitoring needs to act on your behalf to send OS metric information to CloudWatch Logs. You grant Enhanced Monitoring the required permissions using an AWS Identity and Access Management role. This is the rds-monitoring-role.