Cloud & AWS Fundamentals

What is Cloud Computing?
Cloud computing means using remote servers (over the internet) instead of your own hardware.
Instead of buying servers:
    • You rent compute, storage, and services
    • Pay only for what you use

Types of Cloud Services
1. IaaS (Infrastructure as a Service)
    • You control: OS, apps, runtime
    • AWS provides: hardware
Example:
    • Amazon EC2
You launch virtual machines and install everything yourself.
Used when you want full control

2. PaaS (Platform as a Service)
    • You control: application + data
    • AWS manages: OS, runtime, scaling
Example:
    • AWS Elastic Beanstalk
Used for faster development without managing infrastructure

3. SaaS (Software as a Service)
    • Everything managed by provider
    • You just use the software
Example:
    • Amazon QuickSight
Used for end users (dashboards, analytics, CRM, etc.)

Regions & Availability Zones
Region

    • A geographical area with multiple data centers
    • Example: Mumbai region (ap-south-1)
Why it matters:
    • Latency (closer = faster)
    • Legal compliance
    • Cost differences

Availability Zone (AZ)
    • A data center (or group) inside a region
    • Each AZ is isolated (power, networking)
Example:
Mumbai region → 3 AZs
Key Concept
    • High availability = use multiple AZs
    • If one AZ fails → system still works

IAM (Identity & Access Management)
    • Core security system in AWS
Users
    • Individual accounts (e.g., a data engineer)
Roles
    • Temporary access (used by services)
Example:
    • Glue job accessing S3 uses a role, not a user
Policies
    • JSON rules defining permissions
Example:
    • Allow read access to S3
    • Deny delete actions
Best Practices
    • Never use root account
    • Use roles instead of access keys
    • Follow least privilege principle

Pricing & Billing Basics
AWS follows pay-as-you-go
Pricing Models
1. On-Demand
    • Pay per use (no commitment)
Flexible but expensive
2. Reserved Instances
    • Commit for 1–3 years
Cheaper for long-term workloads
3. Spot Instances
    • Very cheap (unused capacity)
Can be interrupted anytime

Example for Data Engineers
    • Store data → S3 (cheap)
    • Query data → Athena (pay per query)
    • Process data → Glue (pay per job run)

Cost Optimization
    • Use S3 lifecycle (move to Glacier)
    • Partition data (reduce query cost)
    • Shut down unused EC2


Key Points (for Data Engineers)
    • Store data → S3
    • Secure access → IAM
    • Ensure uptime → Multi-AZ
    • Optimize cost → right pricing + storage strategy


Topics