Open Source • Apache 2.0

AI killed the database zoo.

No single storage format fits every workload. So KalDB picks the right one automatically — and tunes the query layer to match.

Stop choosing between Elasticsearch and ClickHouse. One API, the right format for every query.

90%
Cost Savings
< 1s
Query Latency
PB
Scale
100%
OpenSearch API Compatible

Your Stack Right Now

Two databases. Two ops burdens. One data problem.

Duplicate Data

The same data lives in Elasticsearch for search and ClickHouse for analytics. Double the storage, double the cost.

Two Ops Burdens

Different clusters, different configs, different failure modes. Your team manages two complex distributed systems.

Manual Tuning

Shard counts, replica settings, materialized views, index templates. Every workload change means hours of tuning — and agentic workloads change patterns constantly.

Runaway Costs

Elasticsearch licensing plus ClickHouse compute. Costs scale linearly with data while budgets don't.

The Cost of Two Databases

You run two databases because each is locked to one storage format. Here's what that costs.

2x
Data Duplication

Every event ingested twice, stored twice, indexed twice. You're paying double for the same data.

2x
Ops Complexity

Two clusters to monitor, patch, scale, and debug. Two on-call rotations. Two sets of expertise.

0
AI in the Loop

Neither system learns from your workload. Every optimization is manual. Every scaling decision is yours.

One API. The Right Format for Every Query.

KalDB replaces both Elasticsearch and ClickHouse — not by picking one format, but by picking the right one every time.

Auto Storage Selection

AI picks from open-source formats — inverted indexes (Lucene), columnar, wide column, and more. Not locked to one format.

Query + Storage Tuning

AI tunes both layers: query planning and routing AND storage format and indexing strategy. Two optimization loops, zero manual work.

Open Data Formats

All storage uses open-source formats. Your data is never locked in. Portable, readable, yours.

How the AI works

1
Workload Detection
KalDB profiles incoming queries — search, analytics, hybrid, agentic — and classifies workload patterns in real time.
2
Storage Format Selection
AI picks the optimal open-source storage format per field and workload: Lucene inverted indexes, columnar (Parquet), wide column, and more.
3
Query Layer Optimization
Routing, caching, and query planning are tuned automatically based on observed access patterns and latency targets.
4
Continuous Re-Tuning
As agentic workloads evolve and query patterns shift, KalDB detects the change and re-optimizes both storage and query layers automatically.
// Same API. Different format chosen per query.
POST /search
{ "query": "error AND service:payments" }
// format: lucene_inverted_index
// reason: full-text match on high-cardinality field
// latency: 23ms
POST /analytics
{ "agg": "count by status_code, 5m buckets" }
// format: columnar_parquet
// reason: low-cardinality aggregation scan
// latency: 180ms
POST /search
{ "query": "user:jane", "agg": "top 10 endpoints" }
// plan: hybrid (inverted_index → columnar)
// reason: filter via index, aggregate via columnar
// latency: 95ms

"But ClickHouse has search..."

ClickHouse has full-text search like your phone has a flashlight.

Bloom Filters vs Inverted Indexes

ClickHouse uses bloom filter indexes for text — probabilistic, imprecise, and unable to rank results. Real search requires inverted indexes with term frequencies and positions.

No Relevance Scoring

There's no BM25, no TF-IDF, no way to rank results by relevance. You get matches, not answers. Every "search" is really a filter.

Latency Gap at Scale

Full-text queries in ClickHouse scan data at query time. At petabyte scale, that means seconds — not milliseconds. The architecture wasn't built for this.

The real point: no single fixed format is optimal for all workloads.

KalDB picks the right storage format automatically — inverted indexes when you need search, columnar when you need analytics, and adapts as your workload changes.

Built for Agentic Workloads

AI agents don't follow predictable query patterns. Your database shouldn't assume they do.

Today's Reality

Agentic workloads create unpredictable, evolving query patterns
Every time agents change behavior, you manually re-tune databases
Pattern shifts mean re-indexing, re-sharding, re-architecting
Migration cost every time the workload evolves

With KalDB

AI detects workload shifts and re-tunes automatically
Storage format adapts to new query patterns in real time
Zero migration cost — no re-architecting when patterns change
Let your agents evolve freely. KalDB keeps up.

Experiment Without Risk

Storage format, partitioning strategy, embedding model, search plugin — KalDB A/B tests every decision automatically without touching production.

Change Anything

Storage format, partitioning strategy, vector embeddings, search plugins, ranking functions — every layer of the system is a tunable experiment. KalDB makes the entire pipeline configurable.

Automatic A/B Testing

Every decision — from which storage format to use to how data is partitioned — runs as an A/B test against the current configuration. KalDB measures the outcome and promotes the winner or rolls back, automatically.

Zero Production Risk

KalDB's built-in multi-tenancy isolates every experiment from production workloads. AI spins up a separate tenant to test changes against real traffic patterns — your live users are never affected, and only proven improvements get promoted.

Traditional systems require a team of engineers to test every change manually. KalDB lets AI experiment continuously — across storage, partitioning, indexing, and search — measuring results and improving the entire system on its own.

Not a Black Box

Every AI decision is visible. Override anything you want.

Visible Decisions

See which storage format was chosen, which query plan was used, and why — for every single query.

Full Observability

Built-in observability into the system's own decisions. Monitor how KalDB is tuning itself over time.

Override Anything

The AI suggests, you control. Pin storage formats, force query plans, or set constraints. Your call.

Open Formats

Open-source storage formats mean you can always inspect your data directly. No proprietary encoding, ever.

Serverless. Zero Ops.

No clusters to manage. No capacity planning. Just data in S3 and queries that scale.

S3-Backed Storage

All data lives durably in S3 with hot data cached locally for performance. No replication to configure, no storage to manage. Cheap and infinitely scalable out of the box.

Elastic Scaling

Compute scales up and down with your workload automatically. Handle traffic spikes without over-provisioning and pay only for what you use.

Easy to Operate

No shards to rebalance, no replicas to sync, no nodes to right-size. Deploy KalDB and let it run — the system manages itself.

Before & After

What changes when you collapse the zoo

Elasticsearch + ClickHouse

  • Each database locked to a single storage format
  • Two clusters to deploy, monitor, and scale
  • Data ingested and stored twice
  • Manual shard tuning and capacity planning
  • Separate query languages and APIs
  • Elastic license uncertainty
  • Costs scale linearly with data volume
  • Locked into proprietary formats and licenses
  • Every workload change requires manual re-tuning

KalDB

  • One system for search and analytics
  • Ingest once, query any way
  • AI handles indexing and optimization
  • OpenSearch-compatible APIs from day one
  • Apache 2.0, forever
  • S3-backed storage at $0.023/GB
  • Open-source data formats — zero lock-in
  • Transparent AI — see every decision
  • Adapts as agentic workloads evolve

Production-Proven at Petabyte Scale

Battle-tested at Slack and Airbnb for years, running petabytes of data in production.

PB+
In Production
Years
Proven at Scale
90%
Cost Reduction
0
Manual Tuning

Replace Two Databases in Minutes

Drop-in replacement for OpenSearch. No migrations or rewrites.

Get Running

# Clone and start
git clone https://github.com/kaldb/kaldb
cd kaldb
docker-compose up
# Point your OpenSearch client at localhost:8080. That's it.