Fabric Data Ingestion (Pipelines)
Data ingestion in Fabric refers to how data is collected, moved, and transformed from different sources into Fabric destinations (like Lakehouse, Warehouse, etc.). The main orchestration tool for this is Pipelines, similar to Azure Data Factory.
1. Pipelines (ADF-like)
Fabric Pipelines are very similar to Azure Data Factory pipelines.
What they are:
• A workflow orchestration tool to automate data movement and transformation
• Built using a drag-and-drop UI
• Allows combining multiple activities into a data pipeline
Key features:
• Supports ETL / ELT workflows
• Connects to multiple data sources (on-prem, cloud, SaaS
)
• Reusable and modular design
• Integration with other Fabric components (Lakehouse, Warehouse, Notebooks)
Example:
A pipeline might:
1. Extract data from SQL Server
2. Transform it
3. Load it into a Lakehouse
2️. Dataflows Gen2
Dataflows Gen2 are the modern data transformation layer in Fabric.
What they are:
• Built on Power Query (M language)
• Used for data preparation and transformation
• Runs at scale using Fabric compute
Key capabilities:
• Visual, no-code / low-code transformation
• Data cleaning, shaping, filtering, joins
• Handles complex transformations without coding
• Stores output in OneLake
Think of it as:
• Pipeline = “when & how data moves”
• Dataflow = “how data is transformed”
️3. Copy Activity
Copy Activity is the core data movement component inside pipelines.
What it does:
• Copies data from source → destination
• Supports structured & semi-structured data
Supported sources/destinations:
• Databases (SQL Server, Azure SQL, etc.)
• File systems (CSV, Parquet, JSON)
• Cloud services (Blob Storage, APIs)
Features:
• Schema mapping
• Incremental loading
• Parallel data transfer (high performance)
• Fault tolerance & retry
Example:
Copy data from:
• On-prem SQL → Fabric Lakehouse
• REST API → Data Warehouse
4️. Scheduling & Triggers
Pipelines can be automated using triggers, so they run without manual intervention.
Types of triggers:
4.1 Schedule Trigger:
• Run pipelines at fixed intervals
(e.g., every hour, daily at 2 AM)
4.2 Tumbling Window Trigger:
• Runs in time slices (useful for incremental loads)
• Ensures no data gaps
4.3 Event-based Trigger:
• Triggered by events like:
• File arrival in storage
• External system events
Benefits:
• Full automation of data workflows
• Supports real-time or batch ingestion
• Reliable execution with monitoring & alerts
Dataflows Gen2: difference from Pipelines
Pipelines Dataflows Gen2 -------------------------------------- Orchestration Transformation Controls workflow Shapes data Uses activities Uses Power Query
Summary:
• Pipelines → Orchestrate workflows (ADF-like)
• Dataflows Gen2 → Transform and prepare data
• Copy Activity → Move data between systems
• Triggers → Automate execution