AWS Security & Governance

Why It’s Critical
In real jobs, you’re not just building pipelines—you’re handling:
   ● User data
   ● Financial datab
   ● Logs and sensitive informationb
You must ensure:
   ● Only the right people/services access data
   ● Data is protected (encrypted)
   ● Access is controlled and auditable

1. IAM Advanced (Roles & Policies)
Core of AWS security
Roles (Most Important)
   ● Temporary permissions (no hardcoded credentials)
   ● Used by AWS services (Glue, Lambda, EC2)
Example:
A AWS Glue job needs to read from S3
→ Assign a role with S3 read access
Policies
   ● JSON documents defining permissions
Example:
   ● Allow: read from S3
   ● Deny: delete objects
Best Practices
   ● Use roles instead of access keys
   ● Follow least privilege principle
   ● Avoid using root account
Think: who can do what in AWS

2. Encryption with AWS Key Management Service (KMS)
What it does:
   ● Creates and manages encryption keys
Types of Encryption
At Rest
   ● Data stored in:
      ● S3
      ● Redshift
      ● EBS
In Transit
   ● Data moving via HTTPS / SSL
Why it matters:
   ● Protects sensitive data
   ● Required for compliance (GDPR, etc.)
Example:
   ● S3 bucket encrypted using KMS key
   ● Only authorized roles can decrypt
Think: locking your data

3. AWS Lake Formation
What it is:
   ● Centralized data lake security & governance layer
Problem it solves:
S3 alone:
   ● Cannot manage fine-grained permissions easily
   ● Becomes complex with many users
What Lake Formation adds:
   ● Table-level permissions
   ● Column-level security
   ● Centralized data access control
Example:
   ● Analyst A → only sees sales data
   ● Analyst B → cannot see salary column
Think: database-like permissions on S3 data

4. Data Access Control (Multi-Level Security)
Different Levels:
1. Resource-Level
   ● Access to S3 bucket or database
2. Table-Level
   ● Access to specific datasets
3. Column-Level
   ● Hide sensitive columns
4. Row-Level (advanced)
   ● Filter data per user
Tools:
   ● IAM → service-level control
   ● Lake Formation → data-level control

Real Pipeline Security Flow:

1. Data stored → S3 (encrypted via KMS)
2. Access → controlled by IAM roles
3. Data visibility → controlled by Lake Formation
4. Queries → restricted per user


Summary
   ● IAM = authentication & authorization
   ● KMS = encryption (data protection)
   ● Lake Formation = fine-grained data governance
   ● Use least privilege + encryption everywhere

Quick Summary:
IAM controls access, KMS secures data, and Lake Formation governs who sees what.


Topics