Highly Available & Fault Tolerant VPCs
Elastic Load Balancer Essentials
- Load balancing is a method used to distribute incoming traffic to a pool of servers. 
- An Elastic load Balancer is an EC2 service that automates the process of distributing incoming traffic (evenly) to all the instances that are associated with the ELB. 
- Cross-zone balancing - Balance traffic to instances across multiple Availability Zones 
- Highly available and fault-tolerant! 
 
- This can be paired with Auto-Scaling to enhance high availability and fault tolerance, AND allow for automated scalability and elasticity. 
- An ELB has its own DNS record set that allows for direct access from the open Internet. 
Important Facts
- Can be public-facing OR internal to balance on private subnets. 
- They will stop serving to unhealthy instances automatically. 
- An ELB or ALB can help reduce compute power on an EC2 instance by allowing for an SSL certificate to be applied directly to the ELB instead of the server, these can be forwarded via regular HTTP to the instances. 
- With a public ELB, they can be attached to private instances on private subnets, allowing certain traffic to be exposed to the Internet! 
ELB Types
Classic ELB
- Designed for simple balancing of traffic to multiple EC2 instances. 
- Doesn't inspect the request. No granular routing "rules" - all instances get routed to evenly and no special routing requests can be made based on specific content requests from the user. 
- TCP, SSL, HTTP, HTTPS 
Application ELB
- Designed for balancing of traffic to one or more instance target groups using content-based "rules" (content is inspected). 
- Content-based rules (on listener) can be configured using: - Host-based rules: Route traffic based on the host field of the HTTP header 
- Path-based rules: Route traffic based on the URL path of the HTTP header 
- Allows you to structure your application as smaller services, and even monitor/auto-sclae based on traffic to specific "target groups" 
- Can balance to multiple ports 
 
- An ALB supports ECS/EKS (port-level forwarding, so containers can run on specific ports with different services), HTTPS, HTTP/2, WebSockets, Access Logs, Sticky Sessions, and AWS Web Application Firewall (WAF) 
Network LB
- Extreme performance! Does not need to scale to handle large spikes in traffic (unlike other 2) 
- Layer 4 (TCP) Load Balancing 
- Static/Elastic IP Addresses per each AZ 
- IP Addresses are the Targets (these can be datacenter servers not part of AWS!) 
- No SSL Offloading 
Stateless
A recommended strategy to meet AWS CSA objectives when utilizing ELBs:
- Store state information off-instance 
- Store it somewhere that is central and scalable (NoSQL like DynamoDB/Redis) 
- Use a shared filesystem across your instances. 
Traffic To/From Private Subnets
- With a public ELB, they can be attached to private instances on private subnets, allowing certain traffic to be exposed to the Internet! 
Bastion Hosts
- EC2 instances that live in public subnets and are used as "gateways" for traffic that is destined for instances in private subnets. - A "portal" to access private subnet EC2 instances! 
 
- Considered critical strong point of a network - all traffic must pass through it first. 
- A bastion should have increased/tight security (usually with 3rd party security and monitoring software installed.) 
- Can be used as an access point to "ssh" into an internal network (to access private resources) without a VPN. 
NAT Gateway
- Provide EC2 instances that live in a private subnet with a route to the internet (so they can download packages and updates) 
- On AWS, they are very effective and don't bottleneck easily (Gigabit traffic throughput) (unlike a NAT instance) 
- Will prevent any hosts located outside of the VPC from initiating a connection with instances that are associated with it. 
- Will only allow incoming traffic through if a request for it originated from an instance in the private subnet. 
- Mitigates one issue: private instances cannot download software and updates 
- MUST - Be in a public subnet 
- Be part of the private subnets route table 
 
- NAT Instance - Identical to NAT Gateway by purpose, but instead of a service it is an EC2 instance modified to do the same job. 
- More of a legacy feature, but exam might still ask. 
 
- More NAT Gateways should be used in multiple AZs to provide reliability! 
VPC Endpoints
Allows for communication to AWS Services from Private Addresses
- Gateway Endpoints: - S3 
- DynamoDB 
 
- Interface Endpoints - Cloudwatch Logs, CodeBuild, CodeBuild, KMS, Kinesis, Service Catalog 
 
Put aliases for these endpoints into the Route Tables for the private subnets. AWS can actually do this for you when creating Gateway Endpoints. A policy can also be used for more restrictions (maybe only from a certain folder in a bucket).
AutoScaling
- A service (and method) provided by AWS that automates the process of increasing or decreasing the number of provisioned on-demand instances available for your application. 
- Will increase or decrease the amount of instances based on chosen Cloudwatch metrics 
- Custom metrics can also be used via CloudWatch API (number of users per instance, maybe?) 
- For example: if your app demand increases, auto scaling can automatically scale up (add additional instances) to meet the demand and terminate the instances when the demand decreases. - Increases Availability for your architecture! 
 
- Components: - Launch Configuration: EC2 "template" used when the auto scaling group needs to provision an additional instance (i.e. AMI, instance type, user-data, storage, security groups, etc.) 
- Auto Scaling Group: All the rules and settings that govern if/when an EC2 instance is provisioned/terminated - MIN/MAX allowed instances 
- VPC & AZs to launch into 
- If provisioned instances should get traffic from an ELB 
- Scaling policies (cloudwatch metrics thresholds that trigger scaling) 
- SNS notifications (inform you of scaling) 
 
- Cloudwatch Alarms: - Metrics are selected that indicate load on instances (CPU, latency, etc.) 
- Alarms are triggered when metrics exceed a threshold 
- Alarms trigger autoscaling policies 
 
 
Stateless Applications
Stateful Applications do not scale well - user logs in and "state" information is stored on the server. It is better to go Stateless! A good example is that, when utilizing the Twitter API, it is a REST interface so everything you need to initiate the request is passed in the GET request including authorization headers and tokens. More on Stateful vs. Stateless.

Notice above, each instance stores user information on each instance. This is ok if the user's connection persists to that instance for all time, but that's likely not the case for most modern applications. This can possibly be mitigated by enabling sticky sessions on the Elastic Load Balancer, which inserts cookies on the user's client to push their requests to the same server every time, but still prevents scaling since users are now stuck on old instances.

The above architecture stores user information elsewhere in a persistent and scalable table like DynamoDB, Redis or a shared Filesystem like EFS . If a user connects to a new instance, then because the architecture is Stateless the correct authorizations are passed to the application and any new instance can always retrieve the needed information.
Terminology
High Availability
Architectures that always be available to users (to a significant statistical degree, you can't guarantee %100 availability). AKA, if one system or component fails (or several) the overall system will still provide functionality to users.
- This can be achieved by placing your components and distributing loads across multiple AZs. These are distributed geographically, so if a tsunami wipes out one you can expect another survived. 
Fault Tolerance
Architectures that do not suffer degradation in performance. This can be done by planning ahead and distributing loads by design and not by necessity.
For example, if your architecture has two nodes that are usually running at 60% capacity then failure of one node creates 120% capacity for the other. This will cause issues and either boot 20% of queued processes or crash the server. Better to have 3 nodes at 40% capacity so if one fails then the loads can be picked up by the other two servers to 60% capacity without issue.
That being said, this obviously has additional costs involved! Maybe your system doesn't need to be fault tolerant, this is up to the use-case.
Many AWS services are fault tolerant by design. For example, it is likely computationally impossible that you will shoot enough DNS lookups to Route53 or Table queries to DynamoDB to cause a noticeable failure or denial of service (DOS).
Last updated
Was this helpful?
