High Availability and Global Infrastructure on AWS

High Availability and Global Infrastructure are fundamental components for building resilient applications on AWS.

This lesson covers AWS global infrastructure, high availability principles, failover strategies, distributed design patterns, and the use of AWS managed services for highly available architectures.

____

How It Works & Core Attributes:

AWS Global Infrastructure:

Infrastructure Components:

What AWS Global Infrastructure is: AWS global infrastructure consists of regions, availability zones, and edge locations distributed worldwide. This infrastructure provides the foundation for building highly available and fault-tolerant applications
AWS Regions: Geographic areas where AWS has data centers. Each region is completely independent and isolated from other regions. Regions are designed to be completely independent to provide fault tolerance and stability
Availability Zones (AZs): One or more discrete data centers within a region, each with redundant power, networking, and connectivity. AZs are connected to each other with high-bandwidth, low-latency networking
Edge Locations: Points of presence (PoPs) that cache content closer to users. Edge locations are used by CloudFront to deliver content with low latency and high transfer speeds

Infrastructure Benefits:

Global Infrastructure Benefits: AWS global infrastructure provides high availability, fault tolerance, and low latency. You can deploy applications across multiple regions and AZs to ensure they remain available even if one region or AZ fails

High Availability Principles:

Availability Fundamentals:

What High Availability is: High availability refers to the ability of a system to remain operational and accessible for a high percentage of time. High availability is achieved through redundancy, fault tolerance, and automatic failover mechanisms
Redundancy: Having multiple copies of critical components to ensure that if one fails, others can take over. This includes redundant servers, databases, and network connections
Fault Tolerance: The ability of a system to continue operating even when some components fail. Fault-tolerant systems can handle failures gracefully without service interruption

Availability Mechanisms:

Automatic Failover: The process of automatically switching from a failed component to a working backup component. Automatic failover ensures that users don't experience service interruption during failures
Health Checks: Regular monitoring of system components to detect failures quickly. Health checks can trigger automatic failover when they detect that a component is not responding properly

Failover Strategies:

Failover Types:

What Failover is: Failover is the process of switching from a failed component to a backup component. Failover strategies determine how and when this switching occurs
Active-Passive Failover: One component (active) handles all traffic while another component (passive) waits in standby. If the active component fails, the passive component takes over
Active-Active Failover: Multiple components handle traffic simultaneously. If one component fails, the remaining components continue to handle traffic

Failover Implementation:

DNS Failover: Using DNS to route traffic to healthy endpoints. If one endpoint fails, DNS can be updated to route traffic to healthy endpoints
Application-Level Failover: Applications detect failures and switch to healthy components. This can include database failover, load balancer failover, and service failover

Distributed Design Patterns:

Design Patterns:

What Distributed Design is: Distributed design involves spreading application components across multiple locations to improve availability, performance, and fault tolerance
Microservices Architecture: Breaking applications into small, independent services that can be deployed and scaled independently. Microservices improve fault tolerance by isolating failures to individual services
Event-Driven Architecture: Using events to communicate between services. Event-driven architectures are loosely coupled and can handle failures gracefully

Fault Tolerance Patterns:

Circuit Breaker Pattern: A design pattern that prevents cascading failures by temporarily stopping requests to a failing service. Circuit breakers can automatically recover when the service becomes healthy again
Bulkhead Pattern: Isolating resources so that a failure in one part of the system doesn't affect other parts. This can include separate database connections, thread pools, and service instances

AWS Managed Services for High Availability:

Service Categories:

What AWS Managed Services are: AWS managed services handle the undifferentiated heavy lifting of infrastructure management, including high availability features. These services automatically provide redundancy and failover capabilities
Amazon Comprehend: A natural language processing service that can analyze text for sentiment, entities, and key phrases. Comprehend is highly available and can be used across multiple regions
Amazon Polly: A text-to-speech service that converts text into natural-sounding speech. Polly provides high availability and can be used for applications that need speech synthesis

Database and Cache Services:

RDS Multi-AZ: Amazon RDS Multi-AZ deployments provide high availability by maintaining a standby replica in a different Availability Zone. The standby replica is automatically promoted to primary if the primary fails
ElastiCache Multi-AZ: ElastiCache can be configured with Multi-AZ for high availability. If the primary node fails, ElastiCache automatically fails over to the replica

Basic Networking Concepts:

Networking Fundamentals:

What Basic Networking is: Understanding fundamental networking concepts is essential for designing highly available architectures. This includes routing, load balancing, and network segmentation
Route Tables: Control how traffic is routed within your VPC and to external networks. Route tables can be configured to route traffic to healthy endpoints and avoid failed components
VPC Design: Virtual Private Cloud design affects high availability. VPCs should span multiple AZs and include proper subnet design for high availability

Network Services:

Load Balancing: Load balancers distribute traffic across multiple targets and can perform health checks to route traffic only to healthy targets
Network Segmentation: Dividing networks into smaller segments to isolate failures and improve security. Network segmentation can prevent failures from affecting the entire system

____

Analogy: A Global Airline Network

Imagine you're managing a worldwide airline network that needs to serve passengers reliably across multiple continents.

AWS Regions: Your major hub airports in different continents that operate independently. Each hub serves its local region and connects to other hubs for global coverage.

Availability Zones: Your multiple runways and terminals at each hub airport. If one runway is closed, you can still operate using the others without disruption.

Edge Locations: Your smaller regional airports that serve local communities. These provide faster access for passengers who don't need to travel to major hubs.

Failover Strategies: Your backup planes and crews ready to take over if the primary ones fail. The system automatically switches to backups when needed.

Distributed Design: Your multiple routes between cities so if one route is blocked, others are available. This ensures passengers can always reach their destinations.

Health Checks: Your regular maintenance checks on aircraft to ensure they're safe to fly. These prevent problems before they cause service disruptions.

____

Common Applications:

Global Web Applications: Applications that need to serve users worldwide with low latency and high availability
E-commerce Platforms: Online stores that need to remain available 24/7 to avoid losing sales
Financial Services: Banking and payment applications that require high availability and fault tolerance
Healthcare Systems: Medical applications that need to remain available for patient care

____

Quick Note: The "High Availability Foundation"

Design applications to span multiple Availability Zones for fault tolerance
Use AWS managed services that provide built-in high availability features
Implement automatic failover mechanisms to handle component failures
Monitor system health and set up alerts for potential issues
Test failover scenarios regularly to ensure they work as expected

High Availability and Global Infrastructure on AWS

How It Works & Core Attributes:

AWS Global Infrastructure:

High Availability Principles:

Failover Strategies:

Distributed Design Patterns:

AWS Managed Services for High Availability:

Basic Networking Concepts:

Analogy: A Global Airline Network

Common Applications:

Quick Note: The "High Availability Foundation"

TAGS

Want to learn more?

Cloud Fundamentals Study Notes

AWS Solutions Architect Associate Study Notes

AWS Developer Associate Study Notes