CDP Dashboard

Events/sec
Total Profiles
Total Events
Active Sessions
Reference Architecture

Real-time Customer Data Platform

From raw clickstream events to live dashboards in milliseconds. Identity resolution, sessionization, and profile aggregation streaming through a pipeline built for scale.

Click any component to explore

Event ProducerRust, tokioRedpandaKafka-compatibleApache FlinkIdentity ResolutionProfile UpdaterSessionizationIceberg WriterScyllaDBLive profilesApache PinotStar-tree OLAPApache IcebergCold storageQuery APIGraphQL + WSDashboardLeptos SSR+WASM
Run it yourselfcurl -fsSL https://raw.githubusercontent.com/jasadams/arcstream/main/deploy/install.sh | bash

Data Guarantees

Exactly-once Delivery
Kafka transactions + Flink checkpoints ensure no duplicates and no data loss across the entire pipeline
Event Deduplication
RocksDB keyed state drops duplicate event IDs within a 10-minute window at the pipeline entry point
Timestamp Clamping
Client clocks clamped to ±7 days of server time — prevents watermark corruption from untrusted sources
Checkpoint Recovery
RocksDB snapshots + Kafka offsets saved to MinIO every 60s. Pod restart resumes with at most 60s reprocessing

Tech Stack

Rust
Event Producer, Query API, Dashboard
Zero-cost abstractions, memory safety, async with tokio
Java
Stream Processing
Required by Flink DataStream API for stateful processing
Apache Flink
Stateful Streaming
Event-time semantics, RocksDB state, checkpoints to MinIO
Redpanda
Event Bus
Kafka-compatible, Seastar framework, single binary, no ZooKeeper
ScyllaDB
Live Profiles
Shard-per-core, sub-ms reads, CQL compatible, disk-backed
Apache Pinot
OLAP Analytics
Star-tree pre-aggregation, sub-ms at 10k QPS
Apache Iceberg
Cold Storage
Parquet on MinIO, 90+ day retention, partitioned by tenant
Leptos
Frontend Framework
Rust SSR + WASM hydration, reactive signals, zero JavaScript
GraphQL
API Layer
async-graphql with WebSocket subscriptions, broadcast channels

Design Decisions