Monitoring AWS

  • CloudWatch is for monitoring/performance.
  • CloudTrail is for auditing API call stacks e.g. when/where/by whom.

CloudWatch

  • Can monitor Compute (EC2, ASG, ELB, Route53 health checks..), Storage & Content Delivery (EBS, Storage)…
  • Metrics
    • Provides metrics (e.g. CPU utilization, Network Utilization, Disk Reads/Writes, Status Check) for every services in AWS.
    • Metrics belong to namespaces
    • Dimension is an attribute of an metric (instance id, environment, etc..)
    • Up to 10 dimensions per metric
    • ❗ 14 days retention, Extended retention offering allows up to 15 months.
    • Metrics have timestamps
    • EC2 Detailed Monitoring
      • 📝EC2 instance metrics have basic metrics for “every 5 minutes”
        • With detailed monitoring (for a cost), you get data “every 1 minute”
      • 💡 Use detailed monitoring if you want to more prompt scale your ASG
      • AWS Free Tier allows up to 10 detailed monitoring metrics
      • ❗ EC2 memory usage is by default not pushed
        • 💡 Must be pushed from inside the instance as a custom metric with e.g. CloudWatch agent.
    • Custom Metrics
      • Possibility to define and send your own custom metrics to CloudWatch
        • E.g. for RAM utilization, disk storage usage.
      • Must be pushed from inside the instance as a custom metric by installing agent
        • E.g. CloudWatch agent
      • Ability to use dimensions (attributes) to segment metrics
        • E.g. Instance.id, Environment.name
      • Metric resolution
        • Standard: 1 minute
        • High resolution: up to 1 second (StorageResolution API parameter)
          • Higher cost
      • API call PutMetricData
      • 💡 Use exponential back off in case of throttle errors
    • ❗ CloudWatch itself does not have a native export feature that will send data periodically to S3.
  • Dashboards
    • Can create CloudWatch dashboard of metrics
    • In console you can monitor & access dashboards
    • Can be regional or global (e.g. include graphs from different regions)
    • You can change the time zone & time range of the dashboards
    • You can setup automatic refresh (10s, 1m, 2m, 5m, 15m)
    • Pricing
      • 3 dashboards (up to 50 metrics) for free
      • 3$ per dashboard per month afterwards
    • Types are:
      • Line: compare metrics over time
      • Stacked area: Compare the total over time
      • Number: instantly see the latest value for a metric
      • Text: Free text with markdown formatting
      • Query results: Explore results from Logs Insights
  • Logs
    • Applications can send logs to CloudWatch using the SDK
    • CloudWatch Logs metric filters can evaluate CloudTrail logs for specific terms, phrases or values.
      • I.e. values are not always from CloudWatch Metrics, but can be generated from Logs e.g. HTTP errors.
    • CloudWatch can collect log from Elastic Beanstalk, ECS, AWS Lambda, VPC Flow Logs, API Gateway, CloudTrail, CloudWatch log agents, Route53 and more.
      • CloudWatch Log Agents
        • Install on EC2 machines sudo yum install -y awslogs
          • Ensure EC2 has IAM permissions to write to CloudWatch
        • Configure /etc/awslogs/awslogs.conf for logs (errors etc.)
        • Configure /etc/awslogs/awscli.conf for region
        • Start service with systemctl start awslogsd
    • CloudWatch logs can go to:
      • Batch exporter to S3 for archival
      • Stream to ElasticSearch cluster for further analytics
      • Stream to Lambda
    • You need to store logs in 2 things:
      • Log groups: arbitrary name, usually representing an application
      • Log stream: instances within application / log files / containers
    • Can define log expiration policies (never expire, 30 days, etc..)
      • You pay for data retention in CloudWatch
    • Using the AWS CLI we can tail CloudWatch logs
      • To see e.g. how application is behaving in real time
    • Security
      • ❗ To send logs to CloudWatch, make sure IAM permissions are correct!
      • Encryption of logs using KMS at the Group Level
    • CloudWatch Logs can use filter expressions
      • E.g. find a specific IP inside of a log
      • Metric filters can be used to trigger alarms
        • E.g. if specific IP appears you can trigger an alarm
    • CloudWatch Logs Insights
      • Log analytics service for CloudWatch
      • Can be used to query logs and add queries into CloudWatch Dashboards
      • Pay for the queries you run
  • Alarms
    • Alarms are used to trigger notifications for any metrics
    • You can set up billing alarms to be triggered after the account charges goes over a certain threshold.
    • Alarms invokes actions such as:
      • EC2 Actions: e.g. restart EC2.
      • SNS Notifications: email, SMS, etc.
      • Auto Scaling: triggers Auto Scaling policies.
    • Various options (sampling, %, max, min, etc…)
    • Alarm states: OK, INSUFFICIENT_DATA, ALARM (being triggered)
    • Period:
      • Length of time in seconds to evaluate the metric
      • 📝 High resolution custom metrics
        • Decreases as metrics age: 1 sec (for 3 hours), then 1 minute (for 15 days), 5 minute (for 63 days), 1 hour for 15 months.
      • E.g. NetworkOut < 2.000.000 for 1 data points (EC2) within 5 minutes
    • Data points
      • Represents the values of that variable over time
      • E.g. if period is 5 minutes and data points is three then the alarm will trigger after 15 minutes of being condition met
  • Events
    • Event Rule
      • Types
        • Schedule: Notifications that’ll be triggered on demand
        • Event Pattern: React to service doing something e.g. CodePipeline state changes.
      • Targets: e.g. lambda function, EC2 StopInstances API call, SNS, SQS, ECS Task, Event bus in another AWS account…
    • Triggers to Lambda functions, SQS/SNS/Kinesis Messages
    • CloudWatch Event creates a small JSON document to give information about the change

CloudTrail

  • Tracks API events allowing you to see who accessed what resources and when.
  • CloudTrail reports on who made the change, when, and from which location.
  • Per AWS account & per region
    • 💡 Should be enabled in all regions with a cloud formation stack.
    • All accounts / regions can log into same S3 bucket in an account / region.
    • In a region when you apply the trail to all regions, CloudTrail creates a new trail in all other regions.
  • Enabled by default
    • Default metrics are from hypervisor (e.g. CPU, connections)
    • Many services has deeper “Advanced monitoring” for inside hypervisor metrics such as connected users, CPU usage per thread / application.
  • Encryption
    • A single KMS key can be used to encrypt log files for trails applied to all regions.
    • CloudTrail log files are by default encrypted using S3 Server Side Encryption (SSE)
    • You can also enable encryption SSE KMS for additional security
  • Get an history of events / API calls made within your AWS Account by Console, SDK, CLI, AWS Services
  • Can push to S3 (encrypted by default), CloudWatch Logs and SNS.
  • Has 90 days of retention
  • 📝 If a resource is deleted in AWS, look into CloudTrail first!

AWS Config

  • Asses, audit, and evaluate the configurations of your AWS resources.
    • Resources include RDS, subnets, DB snapshots, security groups, and event subscriptions.
  • Reports on what has changed
    • You can e.g. look back and see what instances were in default VPC last week.
  • Per AWS account & per region
    • 💡 Should be enabled in all regions with a cloud formation stack.
    • Data aggregation in AWS Config allows you to aggregate AWS Config data from multiple accounts and regions into a single account.
  • Tracks resource state.
  • AWS Config is around compliance, Trusted Advisor is more around recommendations but they check same things for security.
  • Integrates with SNS to receive notifications.

Licenses and Attributions


Speak Your Mind

-->