AWS — Theory
AWS — Theory (interview deep-dive)
Section titled “AWS — Theory (interview deep-dive)”When to choose what (compute)
Section titled “When to choose what (compute)”| Need | Best fit |
|---|---|
| Long-running stateful service | EC2 / ECS-EC2 |
| Containers, hands-off | Fargate |
| Sub-second tasks, bursty | Lambda |
| Need K8s API everywhere | EKS |
| One-off batch jobs | AWS Batch / Step Functions + Lambda |
| Static site / SPA | S3 + CloudFront |
Don’t pick Lambda for: long-running jobs (>15min), low-latency requirements where cold start matters, sustained high traffic (Fargate often cheaper).
RDS vs DynamoDB
Section titled “RDS vs DynamoDB”- RDS: ACID, joins, complex queries, schemas, mature ORMs. Vertical scaling primary; read replicas for reads. Aurora scales storage to 128TB.
- DynamoDB: serverless, single-digit ms p99, horizontal scaling, flexible schema. Limits: querying outside of partition key + sort key needs GSI; no joins; eventual or strong reads (strong costs 2× RCU).
- DynamoDB shines for: known access patterns, very high write throughput, multi-region active-active (Global Tables), session/token stores.
- RDS shines for: relational data, ad-hoc analytics, transactions across rows, mature reporting.
VPC mental model
Section titled “VPC mental model”- VPC = your slice of AWS network with chosen CIDR.
- Subnet = portion in one AZ. Public = route table has IGW; Private = no direct internet; Isolated = no NAT either.
- NAT Gateway = private subnet’s egress to internet (managed, expensive — single biggest networking cost surprise).
- Security Group = whitelist (inbound + outbound), stateful (responses auto-allowed).
- NACL = both allow and deny, stateless, used rarely.
- VPC Endpoints save NAT cost for AWS services.
IAM evaluation
Section titled “IAM evaluation”For a request to an AWS resource:
- Authenticate principal.
- Check organization SCP — must allow.
- Check resource policy (if any) — explicit allow can grant cross-account.
- Check identity policy — must allow.
- Check permissions boundary (cap) — must allow.
- Any explicit Deny anywhere = denied.
Best practices:
- Roles for everything (no long-lived keys).
- Principle of least privilege.
- IAM Access Analyzer to find unused.
- MFA + strong session policies.
S3 deep notes
Section titled “S3 deep notes”- Eventually consistent? Strongly consistent for all ops since 2020.
- Latency: ~10-50ms per request. For high RPS, randomize prefixes (was needed pre-partitioning improvement; less now but still helps for LIST throughput).
- Multipart upload for >100MB. Parallel parts.
- Lifecycle: transition Standard → IA after 30d, IA → Glacier after 90d, expire 365d.
- Versioning + MFA Delete = ransomware protection.
- Object Lock (compliance / governance) for immutable backups.
- Pre-signed URLs for time-limited access without IAM.
- Server-side encryption: SSE-S3, SSE-KMS (audit trail), SSE-C (you provide key).
DynamoDB deep notes
Section titled “DynamoDB deep notes”- Partition key = hash → which physical partition. Hot key = throttling.
- Sort key + partition key = composite primary; supports range queries within partition.
- GSI — alternate access pattern; eventually consistent only.
- LSI — share partition key, alternate sort key; only at table creation.
- On-demand pricing: pay per request. Provisioned: throughput unit + auto-scaling.
- DAX = managed cache (write-through).
- TTL attribute auto-deletes records (within 48h).
- Streams = CDC; trigger Lambda.
- Single-table design — one table for many entities, distinguished by
pkpatterns. Common in mature DynamoDB use.
Lambda deep notes
Section titled “Lambda deep notes”- Cold start: ~100ms-1s typical, longer for VPC-attached, JVM, .NET. Mitigate: provisioned concurrency, lighter runtimes (Node, Python).
- VPC Lambda is fine since hyperplane ENIs.
- Concurrency: account-level (default 1000 burst, 100/sec ramp). Reserved per-function.
- Idempotency: every invocation can retry — design accordingly.
- Lambda + SQS: SQS pulls, Lambda scales up to
MaximumConcurrencyworkers. - Layers for shared code/binaries.
- Architecture: x86_64 vs arm64 (Graviton). arm64 ~20% cheaper.
Multi-AZ vs multi-region
Section titled “Multi-AZ vs multi-region”- Multi-AZ: same region, different DCs. Default for HA. Cheap.
- Multi-region: disaster recovery, latency, compliance. Expensive (data transfer, Aurora Global, S3 replication, DDB Global Tables).
- RTO/RPO drive the choice.
Common interview Qs
Section titled “Common interview Qs”- Design a serverless image-processing pipeline. S3 PUT → Lambda (resize/thumbnail) → S3 → DDB (metadata) → CloudFront. Use SQS for backpressure if Lambda concurrency matters.
- EC2 instance can’t reach the internet. Check route table → IGW (public) or NAT (private), SG egress, NACL, public IP, DNS resolution.
- High Lambda cold starts during traffic spikes. Provisioned concurrency, lighter runtime, smaller package, snapstart for Java.
- Designed RDS Multi-AZ for HA — what does it actually do? Synchronous standby in another AZ. Failover ~60s. Doesn’t scale reads (use read replicas).
- DynamoDB — design table for tweets feed. PK = userId, SK = timestamp. GSI by hashtag with timestamp. Watch for hot partitions on celebrities.
- S3 cost ballooning, what to check? Old versions, multipart upload remnants, missing lifecycle, request cost, data transfer out, KMS calls.
- EKS vs ECS — when each? EKS if K8s API/ecosystem matters or multi-cloud. ECS for simpler AWS-only with Fargate.
- How do you secure secrets for Lambda? Secrets Manager / Parameter Store; fetch at init; rotate via Lambda extension; encrypt env vars with KMS.
- Compliance: only EU users’ data must stay in EU. Region-specific deployment, S3 bucket region constraints, DDB Global Table excluding non-EU regions, IAM SCP.
- CloudFront in front of API Gateway — why or why not? Edge caching for cacheable responses, DDoS shield, single CDN footprint. Skip if all responses are user-specific and uncacheable.
Cost levers (always asked)
Section titled “Cost levers (always asked)”- Right-sizing (Compute Optimizer recommendations).
- Savings Plans / Reserved capacity for steady workloads.
- Spot for interruptible.
- Graviton (arm64) ~20% cheaper.
- S3 lifecycle.
- Delete unattached EBS, old snapshots, idle ELBs.
- VPC Endpoints to avoid NAT cost.
- CloudFront for outbound bandwidth (often cheaper than S3 directly).
- CloudWatch logs retention + filter / sample.
- Reduce inter-AZ traffic where avoidable.
Anti-patterns
Section titled “Anti-patterns”- Long-lived IAM access keys committed in repos.
- Wide-open security groups (0.0.0.0/0 except for ALB/CDN).
- One huge VPC for all environments.
- One Lambda doing everything.
- DynamoDB without thinking about access patterns.
- Public S3 bucket for “convenience”.
- No backups / restore tests.