This article covers the monitoring and metrics domain for EC2 Instances that you need to learn for the AWS Certified SysOps Administrator – Associate Exam. Managing your EC2 instances and being able to monitor your fleet’s health is a primary task performed by System Administrators managing resources on the AWS Cloud.
There are two types of essential checks that you can monitor before you actually set up any CloudWatch metrics. These are Systems Status Check and Instance Status Check
- System Status Checks – monitor the AWS systems and detect problems with your instance caused by the underlying hardware on which they run on. This is the physical host that runs your virtual servers in the cloud. When a system status check fails, you can choose to wait for AWS to fix the issue or you can resolve it yourself (for example, by stopping and starting the instance which will then bring it up to another host). Examples of problems that cause system status checks to fail to include:
- Loss of network connectivity
- Loss of system power
- Software issues on the physical host
- Hardware issues on the physical host that impact network reachability
- Instance Status Checks – monitor the software and network configuration of your individual instance. These checks detect problems which require your input to resolve. When an instance status check fails, typically you will need to address the problem yourself (for example by rebooting the instance or by making modifications to your operating system). Examples of problems that may cause instance status checks to fail to include:
- Failed system status checks
- Misconfigured networking or startup configuration
- Exhausted memory
- Corrupted file system
- Incompatible kernel
Amazon CloudWatch Alarms
You can configure CloudWatch Alarms to monitor your instances over a period of time to watch out for state change and perform an action based on the duration for which a state has changed. CloudWatch Alarms can then be invoked as a result of the change in state and the duration of change and these alarms can trigger an SNS Notification to a Topic, invoke an AutoScaling Policy and perform a specific action
CloudWatch Alarm Actions can include automatic stop, terminate, reboot or recover of an instance. Furthermore, You can use the reboot and recovery actions to automatically reboot those instances or recover them onto new hardware if a system impairment occurs.
Every alarm action you create uses alarm action ARNs and this feature has been upgraded recently so that the new ARNs require you to have EC2ActionsAccess IAM role. This IAM role enables AWS to perform stop, terminate and reboot actions on your behalf and is automatically created when you configure an alarm action for the first time.
You can add the stop, terminate, Reboot, or recovery actions to any alarm that uses custom metrics that include the InstanceID dimension, as long as it references a running EC2 Instance.
Permissions
You must have the following permissions to create or modify an alarm:
- ec2:DescribeInstanceStatus and ec2:DescribeInstances — For all alarms on Amazon EC2 instance status metrics
- ec2:StopInstances — For alarms with stop actions
- ec2:TerminateInstances — For alarms with terminate actions
- ec2:DescribeInstanceRecoveryAttribute, and ec2:RecoverInstances — For alarms with recovery actions
Important Exam Note – If you want to use an IAM role to stop, terminate, or reboot an instance using an alarm action, you must use the EC2ActionsAccess role.
Amazon CloudWatch Events
Enable automatic responses to system events which can be delivered to CloudWatch Events and then proceed to perform an action based on the events.
Amazon CloudWatch Logs
With CloudWatch Logs, you can monitor, store and access your log files from Amazon EC2 Instances, CloudTrail and other sources.
Amazon EC2 Monitoring Scripts
CloudWatch provides standard metrics for various components but excludes a number of metrics that are instance specific. You can, however, create custom metrics and push out this data to CloudWatch to report on. You can configure a script to PUT an instance’s memory metrics into CloudWatch for analysis and reporting.
Other areas of Monitoring
The Amazon EC2 Dashboard provides detail information on:
- Service Health and Scheduled Events by region
- Instance state
- Status checks
- Alarm status
- Instance metric details
- Volume metric details
Amazon CloudWatch Dashboard shows:
- Current alarms and status
- Graphs of alarms and resources
- Service health status
CloudWatch Monitoring – Basic vs. Detailed
- Basic – Data is available automatically in 5-minute periods at no charge.
- Detailed – Data is available in 1-minute periods for an additional cos and you need to specifically enable it for the instance. you can also get aggregated data across groups of similar instances, where you have enabled detailed monitoring
Scheduled Events
With AWS, you can schedule events for your EC2 instances that include start, stop, reboot or even terminate. Furthermore, you can use send notifications of such events prior to commencement if you need to notify anyone of changes.
Amazon EC2 supports the following types of scheduled events for your instances:
- Instance stop: The instance will be stopped. When you start it again, it’s migrated to a new host computer. Applies only to instances backed by Amazon EBS.
- Instance retirement: The instance will be stopped or terminated.
- Reboot: Either the instance will be rebooted (instance reboot) or the host computer for the instance will be rebooted (system reboot).
- System maintenance: The instance might be temporarily affected by network maintenance or power maintenance.
180 Practice Exam Questions – Get Prepared for your Exam Day!
Our Exam Simulator with 180 practice exam questions comes with comprehensive explanations that will help you prepare for one of the most sought-after IT Certifications of the year. Register Today and start preparing for your AWS Certified SysOps Administrator – Associate Exam.