Cloud & AWS Fundamentals
What is Cloud Computing?
Cloud computing means using remote servers (over the internet) instead of your own hardware.
Instead of buying servers:
• You rent compute, storage, and services
• Pay only for what you use
Types of Cloud Services
1. IaaS (Infrastructure as a Service)
• You control: OS, apps, runtime
• AWS provides: hardware
Example:
• Amazon EC2
You launch virtual machines and install everything yourself.
Used when you want full control
2. PaaS (Platform as a Service)
• You control: application + data
• AWS manages: OS, runtime, scaling
Example:
• AWS Elastic Beanstalk
Used for faster development without managing infrastructure
3. SaaS (Software as a Service)
• Everything managed by provider
• You just use the software
Example:
• Amazon QuickSight
Used for end users (dashboards, analytics, CRM, etc.)
Regions & Availability Zones
Region
• A geographical area with multiple data centers
• Example: Mumbai region (ap-south-1)
Why it matters:
• Latency (closer = faster)
• Legal compliance
• Cost differences
Availability Zone (AZ)
• A data center (or group) inside a region
• Each AZ is isolated (power, networking)
Example:
Mumbai region → 3 AZs
Key Concept
• High availability = use multiple AZs
• If one AZ fails → system still works
IAM (Identity & Access Management)
• Core security system in AWS
Users
• Individual accounts (e.g., a data engineer)
Roles
• Temporary access (used by services)
Example:
• Glue job accessing S3 uses a role, not a user
Policies
• JSON rules defining permissions
Example:
• Allow read access to S3
• Deny delete actions
Best Practices
• Never use root account
• Use roles instead of access keys
• Follow least privilege principle
Pricing & Billing Basics
AWS follows pay-as-you-go
Pricing Models
1. On-Demand
• Pay per use (no commitment)
Flexible but expensive
2. Reserved Instances
• Commit for 1–3 years
Cheaper for long-term workloads
3. Spot Instances
• Very cheap (unused capacity)
Can be interrupted anytime
Example for Data Engineers
• Store data → S3 (cheap)
• Query data → Athena (pay per query)
• Process data → Glue (pay per job run)
Cost Optimization
• Use S3 lifecycle (move to Glacier)
• Partition data (reduce query cost)
• Shut down unused EC2
Key Points (for Data Engineers)
• Store data → S3
• Secure access → IAM
• Ensure uptime → Multi-AZ
• Optimize cost → right pricing + storage strategy