Pipelines
A pipeline is a directed graph of processing nodes connected by links. Each pipeline:- Consumes data from one or more Kafka topics
- Processes data through transformation nodes
- Outputs data to sinks (databases, Kafka topics, HTTP endpoints)
Nodes
A node is a single processing unit in your pipeline. Kanal provides three types of nodes:Sources
Ingest data into the pipeline (e.g., Kafka Consumer)
Processors
Transform, filter, route, or enrich data
Sinks
Output data to external systems
- Input ports — Where data enters the node
- Output ports — Where processed data exits
- Configuration — Node-specific settings
Links
Links connect nodes together, defining how data flows through your pipeline.- A link connects an output port of one node to an input port of another
- Data flows along links as individual records
- Links carry schema information for validation
Schemas
Kanal tracks schemas throughout your pipeline to help you build correct transformations. Schemas flow through your pipeline in a bidirectional system: downstream from sources toward sinks, and upstream from sinks back toward sources.Schema Sources
Schemas can come from:- Schema Registry — Avro schemas fetched automatically
- JSON inference — Schemas inferred from sample data
- Manual definition — Schemas you define explicitly