Back to Projects
Stream processing, OLAP storage, and high-throughput data ingestion

Real-Time Analytics Pipeline

An event ingestion and analytics pipeline processing 500K+ events per minute with sub-second query latency.

PythonApache KafkaClickHouseFastAPIDocker

Problem

The analytics team needed real-time visibility into product metrics, but the existing batch ETL pipeline had a 6-hour delay and could not support ad-hoc queries.

Solution

Built a streaming pipeline using Kafka for event ingestion, a Python stream processor for transformation and enrichment, and ClickHouse as the OLAP store for sub-second analytical queries.

Architecture

Events flow from client SDKs through a Kafka topic, are consumed by a pool of stream processors that enrich and validate the data, then batch-insert into ClickHouse. A FastAPI service exposes a query interface with predefined and ad-hoc query support.

Key Decisions

  • Selected ClickHouse over Druid for better SQL compatibility and simpler operations
  • Used Kafka consumer groups for horizontal scaling of stream processors
  • Implemented a schema registry to enforce event contracts
  • Added a dead-letter queue for malformed events with alerting