AWS (Amazon Web Services)¶
Cloud platform that runs half the internet. 200+ services (you'll use maybe 10). 33+ regions globally. Industry standard for cloud infrastructure. Pricing is a mystery, bills are scary, but it works.
2026 Update
AWS continues to dominate cloud infrastructure with over 200 services. Focus on core services (EC2, S3, Lambda, RDS) and learn IAM inside-out. Cost optimization is more critical than ever.
Quick Hits¶
# EC2 - Virtual machines (you'll use this)
aws ec2 run-instances \
--image-id ami-xxx \
--instance-type t3.micro \
--key-name my-key # (1)!
# S3 - Object storage (everyone uses this)
aws s3 cp file.txt s3://bucket-name/
aws s3 sync ./local s3://bucket/path --delete # (2)!
# Lambda - Serverless functions (scales like crazy)
aws lambda invoke \
--function-name my-function \
--payload '{"key":"value"}' \
output.txt # (3)!
# RDS - Managed databases (don't run your own DB)
aws rds describe-db-instances
aws rds create-db-snapshot --db-instance-identifier prod-db # (4)!
# IAM - Identity management (painful but critical)
aws iam create-user --user-name dev-user
aws sts get-caller-identity # "Who the fuck am I?"
aws iam list-attached-user-policies --user-name dev-user # (5)!
# CloudWatch - Logs and monitoring (set billing alarms!)
aws logs tail /aws/lambda/my-function --follow
aws cloudwatch put-metric-alarm \
--alarm-name BillingAlarm \
--alarm-description "Alert when bill exceeds $100" \
--metric-name EstimatedCharges \
--threshold 100 # (6)!
# ECS/EKS - Container orchestration
aws ecs list-clusters
aws eks list-clusters
aws eks update-kubeconfig --name my-cluster # (7)!
- SSH key for instance access - create with
aws ec2 create-key-pair --deleteremoves files from S3 that don't exist locally - dangerous!- Lambda invoke is synchronous - use
--invocation-type Eventfor async - Always snapshot before major changes - saved my ass multiple times
- Check permissions when debugging "Access Denied" errors
- Set this up DAY ONE - learn from others' $10k+ billing surprises
- Updates your kubeconfig for kubectl access
Real talk:
- Start with EC2, S3, RDS - that's 80% of use cases
- IAM is hell, but you MUST learn it - security nightmare otherwise
- Enable MFA on root account RIGHT NOW (seriously, stop reading and do it)
- us-east-1 is cheapest but goes down more often (Murphy's law applies)
- Use
--profilefor multiple accounts (you'll have dev/staging/prod)
import boto3
from botocore.exceptions import ClientError
# S3 upload with proper error handling
def upload_to_s3(file_path, bucket, key):
"""Upload file to S3 with private ACL."""
s3 = boto3.client('s3')
try:
s3.upload_file(
file_path,
bucket,
key,
ExtraArgs={'ACL': 'private'} # Don't leak shit # (1)!
)
return True
except ClientError as e:
print(f"Upload failed: {e}")
return False
# Lambda handler pattern (use this)
def lambda_handler(event, context):
"""Standard Lambda handler with proper error handling."""
try:
# Parse event (API Gateway, SQS, etc.)
body = json.loads(event.get('body', '{}'))
# Do work
result = process_data(body)
# Return proper response
return {
'statusCode': 200,
'headers': {
'Content-Type': 'application/json',
'Access-Control-Allow-Origin': '*' # Adjust for prod # (2)!
},
'body': json.dumps(result)
}
except Exception as e:
print(f"Error: {e}") # Goes to CloudWatch # (3)!
return {'statusCode': 500, 'body': 'Internal error'}
# DynamoDB pattern (NoSQL done right)
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Users')
# Query (efficient) - use this
response = table.query(
KeyConditionExpression='user_id = :uid',
ExpressionAttributeValues={':uid': '12345'}
) # (4)!
# Scan (expensive) - avoid in prod
response = table.scan(Limit=100) # Will cost $$$$ at scale # (5)!
- Always set ACL to private unless you specifically need public access
- Lock down CORS in production -
*is for development only - Lambda logs go to CloudWatch automatically - use structured logging for production
- Queries use indexes - fast and cheap
- Scans read entire table - slow and expensive, only for admin tasks
# CloudFormation/SAM pattern (infrastructure as code)
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:
MyFunction:
Type: AWS::Serverless::Function
Properties:
Runtime: python3.12 # Use latest runtime # (1)!
Handler: app.lambda_handler
Timeout: 30 # Seconds - adjust based on workload
MemorySize: 512 # MB - more memory = faster CPU # (2)!
Environment:
Variables:
TABLE_NAME: !Ref MyTable
Policies:
- DynamoDBCrudPolicy:
TableName: !Ref MyTable # Least privilege # (3)!
MyTable:
Type: AWS::DynamoDB::Table
Properties:
BillingMode: PAY_PER_REQUEST # No capacity planning # (4)!
AttributeDefinitions:
- AttributeName: id
AttributeType: S
KeySchema:
- AttributeName: id
KeyType: HASH
StreamSpecification: # For DynamoDB Streams
StreamViewType: NEW_AND_OLD_IMAGES # (5)!
- Python 3.12 available since 2024 - use latest for performance
- Lambda charges by GB-seconds - 512MB is sweet spot for most workloads
- Only grant permissions this function actually needs
- Pay per request - no provisioned capacity, scales automatically
- Streams enable event-driven architectures - trigger Lambda on changes
Why this works:
- Boto3 is official AWS SDK - well maintained, good docs
- Error handling prevents silent failures
- CloudFormation/SAM enables version control for infrastructure
- DynamoDB queries scale better than scans (use indexes!)
- Lambda handler pattern is battle-tested across millions of functions
Cost Optimization (your CFO will thank you)
- Reserved Instances: 72% savings for predictable workloads (1-3 year commitment)
- Spot Instances: 90% savings for batch jobs (can be terminated with 2-min warning)
- S3 Intelligent-Tiering: Automatic cost optimization based on access patterns
- CloudWatch billing alarms: Set up IMMEDIATELY - prevent $10k+ surprises
- Delete unused resources: Snapshots, AMIs, elastic IPs add up fast
- Use AWS Cost Explorer: Analyze spending patterns, identify waste
- Tag everything: Enable cost allocation by project/team/environment
Security (don't get hacked)
- Never hardcode credentials - use IAM roles, instance profiles, or Systems Manager
- Enable CloudTrail + GuardDuty - detect breaches before bankruptcy
- Systems Manager Parameter Store - free for <10k parameters, encrypted at rest
- VPC Flow Logs - network debugging and security analysis
- Least privilege IAM policies - start restrictive, open up as needed
- MFA on root account - this is non-negotiable, do it now
- AWS Security Hub - centralized security findings (2026 standard)
Performance
- Same region/AZ traffic - cross-region costs money + latency (100ms+)
- CloudFront CDN - S3 alone is slow for users, CDN is sub-50ms globally
- RDS read replicas - scale read-heavy workloads horizontally
- ElastiCache - Redis/Memcached for sub-ms caching (game changer)
- Lambda provisioned concurrency - eliminate cold starts for critical paths
- Use AWS PrivateLink - avoid internet gateway for service-to-service
Gotchas (learn from others' pain)
- Data transfer OUT - in is free, out costs $$$ (especially cross-region)
- NAT Gateway costs - can exceed EC2 instance costs (use VPC endpoints instead)
- CloudWatch Logs - verbose logging = expensive storage ($0.50/GB)
- DynamoDB scans - will bankrupt you at scale, always use queries with indexes
- Lambda cold starts - 1-2 seconds for large functions (use provisioned concurrency)
- EBS snapshots - incremental but deletions are confusing (read the docs!)
- S3 bucket policies - one wrong character = data leak (test with IAM simulator)
Monitoring & Observability
- CloudWatch - included but basic, good enough for small/medium workloads
- X-Ray - distributed tracing for microservices debugging (2026 essential)
- Datadog/New Relic - consider for serious production monitoring
- Set up alarms for: billing, CPU >80%, disk >90%, error rates >1%
- Use CloudWatch Insights - query logs with SQL-like syntax
- CloudWatch RUM - real user monitoring for frontend performance (2026)
When NOT to use AWS
- Small personal projects - Vercel/Netlify/Railway way easier (and cheaper)
- Vendor lock-in concerns - consider Kubernetes on any cloud
- Tiny budget - free tier ends after 12 months, bills start
- No cloud experience - steep learning curve, invest time in fundamentals first
- Compliance hell - some industries require on-prem (banking, healthcare)
Learning Paths¶
Free Resources¶
- AWS Skill Builder - Official training, tons of free courses (start here)
- AWS Free Tier - 12 months free for core services (stay within limits!)
- freeCodeCamp AWS Course - 10+ hour deep dive, quality content
- AWS Workshops - Hands-on labs, various topics
- A Cloud Guru Free Tier - Quality video courses
- AWS Getting Started Guides - Official tutorials
- AWS re:Post - Official Q&A platform (replaced forums in 2024)
Interactive Labs¶
- AWS Sandbox Accounts - Official hands-on tutorials in real AWS
- Qwiklabs AWS - Temporary accounts for safe experimentation
- Instruqt AWS Labs - Browser-based scenarios
- LocalStack - Run AWS locally for development (Pro version worth it)
- AWS CloudShell - Browser-based shell with AWS CLI pre-installed
Certifications Worth It¶
Recommended Path
- Solutions Architect Associate - Most valuable, industry standard
- Developer Associate - If you code daily on AWS
- Skip others unless employer pays or senior role requires
- Cloud Practitioner - $100, easiest, good starting point if totally new
- Solutions Architect Associate - $150, most popular, worth it for resume (this one matters)
- Developer Associate - $150, worth it if you code on AWS daily
- SysOps Administrator Associate - $150, operations-focused
- Skip unless senior/employer pays: Professional certs ($300), Specialty certs ($300) - overkill for most
Reality check:
- Solutions Architect Associate is the sweet spot (most job postings ask for this)
- Study 2-3 months with hands-on practice, exams are scenario-based
- Use Tutorials Dojo practice exams ($15, best investment)
- Join r/AWSCertifications for study tips
Projects to Build¶
Beginner (learn the basics)
- Static website - S3 + CloudFront + Route53 (learn storage + CDN)
- Serverless URL shortener - Lambda + DynamoDB + API Gateway
- EC2 web server - Deploy LAMP/NGINX stack manually
Intermediate (portfolio-worthy)
- REST API - Lambda + API Gateway + DynamoDB + Cognito auth
- File processing pipeline - S3 triggers Lambda, stores results in RDS
- CI/CD pipeline - CodePipeline + CodeBuild + ECR + ECS
- Serverless blog - Amplify + Lambda + DynamoDB + S3
Advanced (job-interview flex)
- Multi-region application - Route 53 failover + RDS cross-region replicas
- Event-driven microservices - SQS/SNS/EventBridge architecture
- Cost optimization dashboard - Lambda + Cost Explorer API + QuickSight
- Real-time analytics - Kinesis Data Streams + Lambda + Timestream
Architecture Patterns¶
Common AWS architecture patterns for real-world applications.
Architecture Diagram Resources
AWS Architecture Icons - Official icon set for creating AWS architecture diagrams (PPT, Draw.io, Visio formats). Essential for documentation and presentations.
Three-Tier Web Application¶
Classic pattern: presentation, application, data layers with high availability.
graph TB
subgraph "Internet"
Users[Users]
end
subgraph "AWS Cloud"
subgraph "Availability Zone 1"
ALB1[Application<br/>Load Balancer]
EC2_1[EC2 Instance<br/>App Server]
RDS_Primary[(RDS Primary<br/>PostgreSQL)]
end
subgraph "Availability Zone 2"
EC2_2[EC2 Instance<br/>App Server]
RDS_Standby[(RDS Standby<br/>Read Replica)]
end
S3[S3 Bucket<br/>Static Assets]
CloudFront[CloudFront CDN]
end
Users -->|HTTPS| CloudFront
CloudFront -->|Static| S3
CloudFront -->|Dynamic| ALB1
ALB1 --> EC2_1
ALB1 --> EC2_2
EC2_1 -->|Read/Write| RDS_Primary
EC2_2 -->|Read/Write| RDS_Primary
RDS_Primary -.->|Replicate| RDS_Standby
style Users fill:#e1f5ff
style CloudFront fill:#ff9900
style S3 fill:#569a31
style ALB1 fill:#ff9900
style EC2_1 fill:#ff9900
style EC2_2 fill:#ff9900
style RDS_Primary fill:#527fff
style RDS_Standby fill:#527fff Components:
- CloudFront: Global CDN, caches static assets at edge locations
- S3: Object storage for images, CSS, JavaScript
- ALB: Distributes traffic across EC2 instances in multiple AZs
- EC2: Application servers running in Auto Scaling group
- RDS: Managed PostgreSQL with multi-AZ failover
Real talk: This pattern handles 10k-100k requests/day. Add auto-scaling for growth.
Serverless Microservices¶
Event-driven architecture with Lambda, API Gateway, and DynamoDB.
graph LR
Client[Mobile/Web<br/>Client] -->|HTTPS| APIGW[API Gateway<br/>REST API]
APIGW -->|Invoke| Auth[Lambda<br/>Auth Function]
APIGW -->|Invoke| Users[Lambda<br/>Users Service]
APIGW -->|Invoke| Orders[Lambda<br/>Orders Service]
Auth -->|Read/Write| Cognito[Cognito<br/>User Pool]
Users -->|Read/Write| UserDB[(DynamoDB<br/>Users Table)]
Orders -->|Read/Write| OrderDB[(DynamoDB<br/>Orders Table)]
Orders -->|Publish| SNS[SNS Topic<br/>Order Events]
SNS -->|Subscribe| Email[Lambda<br/>Email Service]
SNS -->|Subscribe| SQS[SQS Queue]
SQS -->|Process| Worker[Lambda<br/>Worker Function]
style Client fill:#e1f5ff
style APIGW fill:#ff9900
style Auth fill:#ff9900
style Users fill:#ff9900
style Orders fill:#ff9900
style Email fill:#ff9900
style Worker fill:#ff9900
style Cognito fill:#dd344c
style UserDB fill:#527fff
style OrderDB fill:#527fff
style SNS fill:#ff9900
style SQS fill:#ff9900 Components:
- API Gateway: RESTful API with authentication, rate limiting, caching
- Lambda: Stateless functions, auto-scale, pay-per-invocation
- DynamoDB: NoSQL database with single-digit millisecond latency
- SNS/SQS: Async messaging for decoupled microservices
- Cognito: User authentication and authorization
Real talk: Scales to millions of requests, costs pennies at low traffic. Cold starts are 100-500ms.
Data Pipeline Architecture¶
ETL pattern for processing large datasets with S3, Glue, and Athena.
graph TB
Sources[Data Sources<br/>Logs, APIs, Databases] -->|Stream| Kinesis[Kinesis Data<br/>Streams]
Sources -->|Batch| S3_Raw[S3 Raw Bucket<br/>Landing Zone]
Kinesis -->|Real-time| Firehose[Kinesis Data<br/>Firehose]
Firehose --> S3_Raw
S3_Raw -->|Trigger| Glue[AWS Glue<br/>ETL Jobs]
Glue -->|Transform| S3_Processed[S3 Processed<br/>Parquet Format]
S3_Processed -->|Catalog| GlueCatalog[Glue Data<br/>Catalog]
GlueCatalog -->|Query| Athena[Athena<br/>SQL Queries]
GlueCatalog -->|Visualize| QuickSight[QuickSight<br/>Dashboards]
S3_Processed -->|Train| SageMaker[SageMaker<br/>ML Models]
style Sources fill:#e1f5ff
style Kinesis fill:#ff9900
style Firehose fill:#ff9900
style S3_Raw fill:#569a31
style Glue fill:#ff9900
style S3_Processed fill:#569a31
style GlueCatalog fill:#ff9900
style Athena fill:#ff9900
style QuickSight fill:#ff9900
style SageMaker fill:#ff9900 Components:
- Kinesis: Real-time data streaming (alternative to Kafka)
- S3: Data lake storage (raw and processed data)
- Glue: Serverless ETL, converts JSON/CSV to optimized Parquet
- Athena: Query S3 data with SQL, pay per query ($5/TB scanned)
- QuickSight: BI dashboards, ML-powered insights
Real talk: Processes terabytes for cents. Use Parquet format (10x cheaper queries than JSON).
Well-Architected Framework¶
AWS's five pillars for building reliable, secure, efficient systems.
Security¶
Design Principles:
- Identity and Access Management - Use IAM roles, never embed credentials
- Detective Controls - Enable CloudTrail, GuardDuty, Config
- Infrastructure Protection - VPC isolation, security groups, NACLs
- Data Protection - Encrypt at rest (KMS) and in transit (TLS)
- Incident Response - Automated remediation with Lambda
Security Checklist
- [ ] Root account MFA enabled
- [ ] IAM users have MFA
- [ ] S3 buckets are private (no public access)
- [ ] RDS encryption enabled
- [ ] CloudTrail logging to S3
- [ ] GuardDuty threat detection active
- [ ] Security groups follow least privilege
- [ ] Secrets stored in Secrets Manager
- [ ] VPC Flow Logs enabled
- [ ] AWS Config rules for compliance
Reliability¶
Design Principles:
- Multi-AZ Deployment - RDS, ALB, EC2 across 2+ availability zones
- Auto Scaling - Respond to demand changes automatically
- Backup and Recovery - Automated snapshots, cross-region replication
- Change Management - Infrastructure as code (CloudFormation/Terraform)
- Failure Isolation - Bulkheads prevent cascading failures
Reliability Targets
| Availability | Downtime/Year | Architecture |
|---|---|---|
| 99.0% (2 nines) | 3.65 days | Single AZ |
| 99.9% (3 nines) | 8.76 hours | Multi-AZ |
| 99.95% | 4.38 hours | Multi-AZ + Auto Scaling |
| 99.99% (4 nines) | 52.56 minutes | Multi-region |
| 99.999% (5 nines) | 5.26 minutes | Multi-region + Failover |
Performance Efficiency¶
Design Principles:
- Selection - Choose right compute (EC2 vs Lambda vs Fargate)
- Review - Continuously evaluate new services
- Monitoring - CloudWatch metrics, X-Ray tracing
- Trade-offs - Consistency vs latency, normalization vs denormalization
Service Selection Guide
graph TD
Start{Compute Need?} -->|Containers| Container{Orchestration?}
Start -->|VMs| VM{Persistent?}
Start -->|Functions| Lambda[Lambda<br/>Event-driven]
Container -->|Yes| EKS[EKS<br/>Kubernetes]
Container -->|No| ECS[ECS/Fargate<br/>Simpler]
VM -->|Yes| EC2[EC2<br/>Full Control]
VM -->|No| Batch[AWS Batch<br/>Job Scheduling]
style Start fill:#e1f5ff
style Lambda fill:#ff9900
style EKS fill:#ff9900
style ECS fill:#ff9900
style EC2 fill:#ff9900
style Batch fill:#ff9900 Cost Optimization¶
Design Principles:
- Right Sizing - Match instance size to workload (don't over-provision)
- Elasticity - Auto-scale down during off-peak hours
- Pricing Models - Reserved Instances (72% off), Spot (90% off)
- Managed Services - RDS cheaper than self-managed EC2 databases
- Cost Allocation - Tag everything for chargeback/showback
Cost Saving Strategies
| Strategy | Savings | Best For |
|---|---|---|
| Reserved Instances (1yr) | 40% | Predictable workloads |
| Reserved Instances (3yr) | 72% | Long-term commitments |
| Spot Instances | 90% | Fault-tolerant, flexible |
| Savings Plans | 72% | Flexible compute usage |
| S3 Intelligent-Tiering | 70% | Infrequently accessed data |
| Lambda vs EC2 | 80% | Low-traffic APIs |
| Graviton Instances | 40% | ARM-compatible workloads |
Operational Excellence¶
Design Principles:
- Operations as Code - Infrastructure as code, runbooks as code
- Frequent, Small Changes - Reduce blast radius of failures
- Refine Operations - Learn from failures, improve processes
- Anticipate Failure - Chaos engineering, game days
- Learn from Failures - Post-mortems without blame
Operational Metrics
- MTTR - Mean Time To Recovery (target: <1 hour)
- Change Failure Rate - Failed changes / total changes (target: <15%)
- Deployment Frequency - Daily for high-performing teams
- Lead Time - Code commit to production (target: <1 day)
Community Pulse¶
Who to Follow¶
Twitter/X:
- @awscloud - Official updates, new service launches
- @QuinnyPig - Corey Quinn, AWS cost optimization, hilarious roasts
- @ben11kehoe - Serverless expert, AWS Community Builder
- @nathankpeck - AWS Principal Dev Advocate, ECS/containers expert
- @jeremy_daly - Serverless champion, great technical insights
- @neiltheblue - Solutions Architect, hands-on tutorials
- @esh - AWS Chief Evangelist (Jeff Barr)
YouTube/Streamers:
- AWS Online Tech Talks - Deep dives, re:Invent sessions
- FooBar Serverless - Serverless tutorials
- Be A Better Dev - Practical AWS projects
- TechWorld with Nana - DevOps + AWS tutorials
- AWS Events - Conference talks, workshops
Active Communities¶
- r/aws - 250k+ members, active daily, mix of beginner + advanced (best community)
- AWS Community Discord - Official, helpful, core team present
- Dev.to #aws - Quality tutorials, case studies, community posts
- AWS Community Builders - Official program, great networking
- AWS re:Post - Official Q&A (replaces old forums)
- ServerlessLand Community - Serverless-focused, active Slack/Discord
Podcasts & Newsletters¶
Podcasts:
- AWS Podcast - Official, weekly, new features + customer stories
- Screaming in the Cloud - Corey Quinn, hilarious, critical of AWS (in good way)
- AWS TechChat - Technical deep dives
- AWS Morning Brief - Short daily AWS news
Newsletters:
- Last Week in AWS - Weekly, irreverent, critical analysis (subscribe now)
- Off-by-none - Serverless newsletter, Jeremy Daly, quality content
- AWS Week in Review - Official blog, weekly updates
- AWS Open Source News - Open source projects on AWS
Events & Conferences¶
- AWS re:Invent - Las Vegas, late November, 50k+ attendees, $2k+ (worth it once in career)
- AWS Summit - Free, regional (20+ cities), good for networking
- AWS Community Day - Free, community-organized, worldwide, quality talks
- ServerlessDays - Free/cheap, serverless-focused, technical talks
Worth Checking¶
Last Updated: 2026-01-31 | Vibe Check: Mainstream - AWS is the default cloud. Not the coolest kid anymore (Vercel/Railway have better DX), but runs most production workloads. If you're doing cloud professionally, you're learning AWS.
Tags: aws, cloud