AWS Security & Governance
Why It’s Critical
In real jobs, you’re not just building pipelines—you’re handling:
● User data
● Financial datab
● Logs and sensitive informationb
You must ensure:
● Only the right people/services access data
● Data is protected (encrypted)
● Access is controlled and auditable
1. IAM Advanced (Roles & Policies)
Core of AWS security
Roles (Most Important)
● Temporary permissions (no hardcoded credentials)
● Used by AWS services (Glue, Lambda, EC2)
Example:
A AWS Glue job needs to read from S3
→ Assign a role with S3 read access
Policies
● JSON documents defining permissions
Example:
● Allow: read from S3
● Deny: delete objects
Best Practices
● Use roles instead of access keys
● Follow least privilege principle
● Avoid using root account
Think: who can do what in AWS
2. Encryption with AWS Key Management Service (KMS)
What it does:
● Creates and manages encryption keys
Types of Encryption
At Rest
● Data stored in:
● S3
● Redshift
● EBS
In Transit
● Data moving via HTTPS / SSL
Why it matters:
● Protects sensitive data
● Required for compliance (GDPR, etc.)
Example:
● S3 bucket encrypted using KMS key
● Only authorized roles can decrypt
Think: locking your data
3. AWS Lake Formation
What it is:
● Centralized data lake security & governance layer
Problem it solves:
S3 alone:
● Cannot manage fine-grained permissions easily
● Becomes complex with many users
What Lake Formation adds:
● Table-level permissions
● Column-level security
● Centralized data access control
Example:
● Analyst A → only sees sales data
● Analyst B → cannot see salary column
Think: database-like permissions on S3 data
4. Data Access Control (Multi-Level Security)
Different Levels:
1. Resource-Level
● Access to S3 bucket or database
2. Table-Level
● Access to specific datasets
3. Column-Level
● Hide sensitive columns
4. Row-Level (advanced)
● Filter data per user
Tools:
● IAM → service-level control
● Lake Formation → data-level control
Real Pipeline Security Flow:
1. Data stored → S3 (encrypted via KMS) 2. Access → controlled by IAM roles 3. Data visibility → controlled by Lake Formation 4. Queries → restricted per user
Summary
● IAM = authentication & authorization
● KMS = encryption (data protection)
● Lake Formation = fine-grained data governance
● Use least privilege + encryption everywhere
Quick Summary:
IAM controls access, KMS secures data, and Lake Formation governs who sees what.