edit_note Engineering Blog

Insights & Updates

Technical deep-dives, release notes, and stories from the team building the next-generation hybrid database engine.

1,000,000 jokes

Featured Tutorial Dataset

Goran C. Apr 15, 2026 4 min read

From Zero to a Million Jokes: Loading Hugging Face Datasets into CameoDB

CameoDB is running, you've tested it with the books index from Quickstart. Now let's load something bigger. One CLI command, one million Reddit jokes from Hugging Face, fully indexed and searchable in seconds.

Hugging Face CLI Data Loading

arrow_forward

You've Got CameoDB Running. Now What?

If you followed the Quickstart, you already have CameoDB running and a books index loaded. That's your proof-of-concept. Now let's go from a handful of records to one million.

The dataset: SocialGrep/one-million-reddit-jokes on Hugging Face. A single CSV file with joke titles, body text, scores, subreddits, and timestamps. Perfect for demonstrating CameoDB's ability to detect schemas and ingest data directly from a URL.

Step 1: Detect the Schema

Point CameoDB's CLI at the raw CSV URL. The schema detect command reads the header row and samples data to infer field types automatically:

schema detect https://huggingface.co/datasets/SocialGrep/one-million-reddit-jokes/resolve/main/one-million-reddit-jokes.csv

CameoDB returns a full schema definition with 10 detected fields:

{
  "routing_field_name": "id",
  "fields": {
    "id":          { "field_type": "Text",    "indexed": true,  "stored": true,  "tokenizer": "raw" },
    "title":       { "field_type": "Text",    "indexed": true,  "stored": false },
    "selftext":    { "field_type": "Text",    "indexed": true,  "stored": false },
    "score":       { "field_type": "I64",     "indexed": true,  "fast": true  },
    "created_utc": { "field_type": "I64",     "indexed": true,  "fast": true  },
    "subreddit":   { "field_type": "Boolean", "indexed": true  },
    "type":        { "field_type": "Text",    "indexed": true  },
    "permalink":   { "field_type": "Text",    "indexed": true  },
    "domain":      { "field_type": "Text",    "indexed": true  },
    "url":         { "field_type": "Text",    "indexed": true  }
  }
}

Notice: score and created_utc are detected as I64 with fast fields enabled, meaning they support range queries and sorting. Text fields are fully indexed for search.

Step 2: Load One Million Records

Now the single command that does everything—creates the index, applies the schema, downloads the CSV, and streams all rows in batches:

data load jokes https://huggingface.co/datasets/SocialGrep/one-million-reddit-jokes/resolve/main/one-million-reddit-jokes.csv

Schema was missing; detected and applied schema to index 'jokes'
Ingestion complete for index 'jokes': loaded=1000000 failed=0 (batch size 4000)

That's it. One million records, zero failures. CameoDB auto-detected the schema on first contact and streamed the data in batches of 4,000. The jokes index didn't exist before this command—it was created on the fly.

Step 3: Search Instantly

The index is immediately queryable. Let's find football jokes:

search jokes title:football limit 5 return title, selftext

{
  "hits": [
    { "_score": 10.50, "title": "Football", "selftext": "[removed]" },
    { "_score": 10.41, "title": "Football", "selftext": "As a woman passed her daughter's closed bedroom door..." },
    { "_score": 9.91,  "title": "Fart Football", "selftext": "An old married couple no sooner hit the pillows..." },
    // ... 2 more results
  ],
  "hits_returned": 5,
  "total_hits": 1330,
  "took_ms": 11,
  "stats": { "shards": { "total": 4, "responded": 4, "failed": 0 } }
}

1,330 football jokes found across 4 shards in 11 milliseconds. The data was distributed automatically. No configuration, no manual sharding, no external tooling.

The Takeaway

Three commands. That's the entire workflow from discovering a dataset on Hugging Face to running full-text search queries against a million records. CameoDB handles schema inference, index creation, batch ingestion, and distributed search—all from the CLI.

rocket_launch Quickstart Guide terminal CLI Reference open_in_new Dataset on HF

psychology

MCP

GC Goran C. Apr 10, 2026 5 min read

Native MCP Server: Give AI Agents Direct Database Access

How CameoDB's built-in Model Context Protocol server enables Claude, Cursor, and Windsurf to query your data instantly.

MCP AI Agents

arrow_forward

Built-In, Not Bolted-On

CameoDB ships with a native Model Context Protocol (MCP) server running in the same binary—no sidecars, no middleware. The moment you start CameoDB on port 9480, your data becomes queryable by AI agents like Claude Desktop, Cursor, and Windsurf through the /mcp/sse endpoint.

What's Exposed: 6 Read-Only Tools

The MCP server exposes six tools—all read-only for security. Agents can discover, query, and validate, but never modify your data:

search_index

Full-text search on a single index with Tantivy query syntax.

search_indexes

Federated search across multiple indexes with merged results.

list_indexes

Discovery: list all indexes with schemas and queryable fields.

get_index

Schema inspector with per-field operator hints and types.

validate_query

Query linter with syntax validation and "did you mean" suggestions.

get_index_stats

Document counts, index size, and cluster metadata.

Query Syntax: Tantivy-Powered

The search_index tool supports the full Tantivy query language:

// Field targeting
title:rust

// Phrases with proximity
body:"small bike"~2

// Boolean operators (UPPERCASE required)
title:rust AND author:doe
(title:rust OR title:go) AND year:[2020 TO 2024]

// Range queries
score:>=100
date:[2024-01-01 TO 2024-12-31]

// Boosting for relevance
title:rust^3 OR body:rust

// Set operations
status: IN [active pending review]

Anti-Hallucination Rule

Every search tool includes a critical instruction: "When answering questions based on CameoDB results, you MUST use ONLY the exact data returned by this tool. Do NOT combine database results with your own prior knowledge." This ensures agents provide factual, grounded responses based solely on your data.

Field Type Awareness

The MCP server provides per-field operator hints based on data types:

text:     all operators (phrases, slop, prefix, IN, boost, range)
string:   exact match, prefix, IN, exists (no phrases/slop)
numeric:  exact, comparisons (>, <), range, boost, exists
date:     exact, comparisons, range, exists
boolean:  true/false only, exists
json:     dot notation (field.sub:value), nested exists

Setup: Two Configuration Styles

The MCP server uses Server-Sent Events (SSE) transport. Configuration depends on your AI tool:

Windsurf & Cursor (Native SSE)

Add to .windsurf/mcp.json or Cursor MCP settings:

{
  "mcpServers": {
    "cameodb": {
      "url": "http://localhost:9480/mcp/sse",
      "transport": "sse"
    }
  }
}

Claude Desktop (Curl Bridge)

Claude currently requires a curl bridge for SSE transport:

{
  "mcpServers": {
    "cameodb": {
      "command": "curl",
      "args": [
        "-N",
        "-H", "Accept: text/event-stream",
        "http://localhost:9480/mcp/sse"
      ]
    }
  }
}

Restart your AI tool after configuration. The agent will automatically discover your indexes and schemas.

psychology MCP Documentation code Query Syntax

speed

Performance

GC Goran C. Apr 3, 2026 6 min read

Microsecond Latency: Benchmarking CameoDB's Zero-Copy Architecture

Deep dive into how Rust's ownership model and zero-copy serialization deliver sub-millisecond query performance at scale.

Rust Performance Architecture

arrow_forward

The Zero-Copy Advantage

When you're processing millions of documents, every memory allocation matters. Traditional databases copy data through multiple layers, network buffers, serialization formats, intermediate structures, each adding latency and GC pressure. CameoDB takes a different approach: zero-copy ingestion.

Built with Rust 2024 Edition, CameoDB leverages the language's ownership model to pass data by reference without copying. When you stream a CSV from Hugging Face, parse JSON from a local file, or ingest NDJSON over HTTP, the bytes travel through a single buffer. No serde allocations. No intermediate clones. Just raw bytes flowing from source to storage.

Hybrid Storage: Two Engines, One Pipeline

CameoDB's performance comes from its hybrid architecture, combining redb (ACID-compliant KV store) with Tantivy (full-text search engine). Each write operation follows an atomic sequence:

// 1. Generate sequence ID (AtomicU64, lock-free)
seq_id = counter.fetch_add(1)

// 2. Begin redb transaction (ACID isolation)
txn = kv.begin_write()

// 3. Write to WAL (durability before application)
wal.insert(seq_id, serialized_op)

// 4. Write to data table (complete JSON document)
data.insert(id, json_blob)

// 5. Update Tantivy index (in-memory buffer)
writer.add_document(id_only_doc)

// 6. Commit redb transaction (fsync if configured)
txn.commit()

// 7. Signal supervisor for smart commit
supervisor.reset_timer()

The key insight: Tantivy stores only indexed fields. The complete JSON document lives exclusively in redb. When you search, Tantivy returns matching document IDs, then we batch-fetch the full documents from redb. This split-storage strategy means smaller indices, faster searches, and zero data duplication.

Supervised Smart Commits: The 5-Second Guarantee

Committing to disk is expensive. Doing it on every write kills throughput. Skipping it risks data loss. CameoDB's Supervised Smart Commits thread the needle with an adaptive algorithm:

Smart Commits

Trigger when operation count reaches adaptive threshold (500-8000 ops based on memory budget). Immediate commit during write bursts.

Supervised Commits

After 5 seconds of write inactivity, background supervisor commits. Guarantees durability for low-volume patterns.

Every write signals a per-index supervisor, resetting a 5-second timer. If writes stop flowing, the supervisor fires an eventual commit. If writes keep coming, smart commits trigger at adaptive thresholds. Supervisors self-cleanup after successful commits, no resource leaks, no background tasks lingering.

Tiered Cache Sizing: Fast Startup, Steady State

Opening a 2GB database with a 32MB cache is painful. CameoDB uses a two-phase initialization strategy:

// Phase 1: Init Boost (fast WAL recovery)
cache_size = database_tier * multiplier
// Small:  32MB  → 32MB   (1×)
// Medium: 128MB → 512MB  (4×)
// Large:  256MB → 2GB    (8×)

// Phase 2: Normal Operation (steady state)
cache_size = standard_tier
// Release init boost memory for multi-shard deployments

For a 2GB database on a node with 16GB RAM, CameoDB opens with 512MB cache for fast recovery, then drops to 256MB for steady operation. Per-shard memory is automatically divided across all active shards, no manual tuning required.

Benchmark Results: The Numbers

So what does this architecture deliver in practice? Here are real-world benchmarks from the storage engine:

Single Operations

0.5-3ms per operation

Depends on fsync configuration

Batch Operations

0.05-0.5ms per operation

10-60x faster than single ops

Point Queries

~0.1ms (redb B-tree)

KV lookup by document ID

Search Queries

10-100ms (Tantivy)

Depends on index size, complexity

Throughput scales dramatically with batching: 2,000-15,000 ops/sec individual versus 10,000-100,000 ops/sec batched. The difference comes from amortized commit overhead and reduced mutex contention.

Async-Sync Isolation: Blocking Without Blocking

All redb and Tantivy I/O happens inside tokio::task::spawn_blocking. This is critical—storage operations are inherently blocking (disk I/O, B-tree traversals, index segment merges). By offloading to a dedicated blocking thread pool, CameoDB's async runtime stays responsive. No thread starvation. No latency spikes from blocking the event loop.

The Takeaway

Zero-copy ingestion, hybrid storage, supervised smart commits, tiered caching, async-sync isolation, these aren't isolated optimizations. They're an integrated architecture where each component amplifies the others. Rust's ownership model makes zero-copy safe. The hybrid split-storage strategy enables smaller indices. Smart commits reduce I/O while supervised commits guarantee durability. The result: sub-millisecond latency at scale, without the complexity of manual tuning.

rocket_launch Quickstart Guide download Download Binaries

hub

Leaderless Mesh

Featured Distributed

Goran C. Mar 28, 2026 7 min read

Leaderless Mesh: How CameoDB Scales Without Consensus

Zero master nodes, no Raft, no Paxos. CameoDB uses consistent hashing and Kademlia DHT for seamless peer discovery and automatic data distribution across the cluster.

Distributed Kademlia Scalability

arrow_forward

The Consensus Problem

Most distributed databases rely on consensus algorithms like Raft or Paxos. These protocols ensure consistency by electing a leader, replicating writes, and requiring majority agreement before commits. The tradeoff, complexity and latency. Every write must travel through the leader, wait for quorum acknowledgment, and handle leader elections during failures. As clusters grow, consensus overhead increases, and scaling becomes painful.

CameoDB takes a different approach, a leaderless mesh. No single point of failure, no leader elections, no quorum waits. Instead, data distribution and routing are handled through consistent hashing and a Kademlia DHT. Each node knows where data belongs, and operations route directly to the owner. The result, linear scalability with minimal coordination overhead.

Consistent Hashing: Data Distribution Without Coordination

At the heart of CameoDB's distribution is a consistent hash ring. Each node is assigned virtual node tokens, and each shard maps to a specific position on the ring. When you write a document with a routing key, the system hashes the key using XXH3, finds the position on the ring, and determines which node owns that shard.

// Routing decision algorithm
hash = xxh3_64(routing_key)
shard_id = ConsistentRing.get_owner(hash)
node_id = shard_assignments[shard_id].owner
decision = if node_id == local_node
    RoutingDecision::Local
  else
    RoutingDecision::Remote { node_id, peer_addr }

This means deterministic routing without coordination. The same routing key always maps to the same shard and node, regardless of which node receives the request. Add a new node to the cluster, and the ring rebalances automatically. Remove a node, and data redistributes to neighbors. No central coordinator required.

Kademlia DHT: Peer Discovery Without Configuration

How do nodes find each other? CameoDB uses a Kademlia DHT built on libp2p. Each node maintains a routing table of known peers, organized by XOR distance from its own node ID. When a node joins, it contacts bootstrap nodes and discovers the network through iterative lookups.

The DHT provides three critical capabilities:

Peer Discovery

Automatically find and connect to cluster nodes without manual configuration.

Routing Metadata

Distribute shard ownership information across the cluster for decentralized routing.

Self-Healing

Detect node failures and automatically rebalance data without human intervention.

Actor-Based Remote Execution

CameoDB uses the Kameo actor framework for distributed execution. Each node runs a `NodeOrchestrator` actor that manages local shards, and these actors can communicate remotely over libp2p. When the router decides an operation belongs on a remote node, it uses Kameo's remote messaging to forward the request.

// Remote call path
orchestrator_name = format!("orchestrator-{}", target_node_id)
remote_ref = RemoteActorRef::lookup(orchestrator_name).await
result = remote_ref.ask(&ClientOp::Search { ... }).await

Remote actors are registered with stable names like `orchestrator-{node_id}` and `shard-{shard_id}`, making them discoverable across the cluster. The same `ClientOp` message type is used locally and remotely, so operation semantics are consistent regardless of where they execute.

Scatter-Gather: Fan-Out Without Bottlenecks

For queries without a routing key, CameoDB uses scatter-gather broadcast. The router asks the cluster coordinator for known peers, selects up to a fanout limit, and executes the query in parallel across local and remote nodes.

// Broadcast search algorithm
peers = coordinator.get_known_peers()
selected = peers.take(broadcast_fanout_limit)

local_result = handle_client_op(op)  // Local search
remote_results = parallel(selected, |peer| {
    remote.ask(peer, op, timeout)
})  // Fan-out to remotes

merged = top_k_merge(local_result, remote_results, limit)

Results are merged using score-aware top-K aggregation, allowing higher-scoring remote hits to displace weaker local results. Bounded concurrency ensures the system doesn't overwhelm itself with parallel requests, and per-call timeouts prevent stragglers from blocking the response.

Event-Driven State Management

Cluster metadata is persisted without polling or background tasks. All state transitions occur on membership events, `PeerDiscovered` when a node joins, `PeerLost` when a node leaves. The cluster coordinator maintains a simple state machine with three states, Active (all nodes present), Degraded (some nodes missing), Failed (below quorum).

Metadata is written to `metadata.redb` inline with state changes, providing crash-safe persistence without separate background processes. On boot, nodes load the persisted snapshot and reconcile with actual cluster state, logging discrepancies for operational visibility.

The Tradeoffs

Leaderless architecture isn't magic. It makes different tradeoffs than consensus-based systems:

Advantages

No leader election latency
No single point of failure
Linear scalability
Simpler operational model
Better write throughput

Considerations

Eventual consistency for some operations
Requires careful key design
Network partitions handled via timeout
Read-your-writes requires routing keys

The Takeaway

CameoDB's leaderless mesh trades consensus complexity for deterministic routing and automatic distribution. Consistent hashing ensures data lands in predictable places, Kademlia DHT handles peer discovery without configuration, and actor-based remote execution provides clean distributed semantics. The result is a database that scales linearly, handles failures gracefully, and keeps operational complexity low. No Raft, no Paxos, just math and networking doing what they do best.

library_books Architecture Docs download Download Binaries

Full-Text Search

Featured Search

Goran C. Mar 20, 2026 6 min read

Search Everything: Full-Text Indexing with Tantivy Under the Hood

Inverted indexes, relevance scoring, phrase queries, and field-level boosting. Explore how Tantivy powers CameoDB's hybrid search architecture with index-only storage strategy.

Search Tantivy Indexing

arrow_forward

The Inverted Index Advantage

Traditional databases struggle with text search. A LIKE query with wildcards scans every row, O(n) complexity that destroys performance as data grows. Full-text search engines use inverted indexes, mapping terms to document IDs for O(log n) lookups regardless of dataset size.

CameoDB integrates Tantivy, a Rust-based search library inspired by Lucene. Tantivy provides inverted indexes, term dictionaries, positional data for phrase queries, and BM25 relevance scoring. Combined with redb's ACID KV storage, CameoDB delivers hybrid search, fast point lookups via redb, rich full-text queries via Tantivy.

Index-Only Storage Strategy

Most search engines duplicate data, storing complete documents in the index for retrieval. This bloats index size and slows writes. CameoDB takes a different approach, index-only storage.

// Tantivy stores only indexed fields (no STORED flag)
schema = SchemaBuilder::new()
    .add_text_field("title", TEXT | STORED)
    .add_text_field("content", TEXT)  // No STORED
    .build()

// redb stores complete JSON documents
redb.insert("doc:123", full_json_document)

Tantivy stores only the `id` field and indexed fields without the STORED flag. Complete JSON documents live exclusively in redb. When you search, Tantivy returns matching document IDs and scores, then we batch-fetch the full documents from redb. This split-storage strategy means smaller indices, faster search performance, and zero data duplication.

Field-Level Control and Schema Evolution

CameoDB provides fine-grained control over field indexing through schema configuration. Each field can be configured with:

INDEXED

Enables range queries, filtering, and term lookups on the field.

FAST

Enables sorting and aggregations on the field value.

STORED

Stores the field value in Tantivy for direct retrieval (rarely used).

TEXT

Tokenizes and indexes for full-text search with stemming.

Schema evolution is automatic. When new fields appear in incoming documents, CameoDB infers their types and adds them to the schema. Field fingerprints track schema changes, enabling cache invalidation and distributed synchronization without manual schema migrations.

Query Capabilities: Beyond Simple Search

Tantivy supports sophisticated query patterns that go beyond keyword matching:

// Phrase queries with proximity
query = "database systems"~2  // Within 2 words

// Range queries on numeric fields
query = price:[100 TO 500]

// Boolean combinations
query = (title:rust OR title:go) AND year:[2020 TO 2024]

// Boosting for relevance tuning
query = title:rust^3 OR body:rust

Phrase queries use positional indexes to find terms in specific order with configurable proximity slop. Range queries work on numeric and date fields. Boolean operators enable complex logic with AND, OR, NOT. Field boosting allows you to weight certain fields higher in relevance scoring.

BM25 Relevance Scoring

Tantivy uses BM25, the industry-standard relevance scoring algorithm. BM25 considers term frequency (how often terms appear in a document), inverse document frequency (how rare terms are across the corpus), and document length normalization. This means documents matching rare terms score higher, and longer documents aren't unfairly penalized.

CameoDB returns scores alongside search results, enabling your application to implement custom ranking logic, result re-sorting, or confidence thresholds for filtering low-quality matches.

Segment Merging and Performance

Tantivy stores indexes in immutable segments. As documents are added, new segments are created. Periodic merging combines smaller segments into larger ones, reducing the number of segments that must be searched during queries. This improves read performance while maintaining write throughput.

CameoDB's Supervised Smart Commits control when segments are committed to disk. High-volume workloads trigger smart commits at adaptive thresholds (500-8000 operations), while low-volume patterns use supervised commits after 5 seconds of inactivity. This balances performance with durability guarantees.

The Takeaway

Tantivy provides the full-text search engine that powers CameoDB's hybrid architecture. The index-only storage strategy keeps indices lean while redb provides fast document retrieval. Schema evolution, phrase queries, BM25 scoring, and segment merging deliver production-grade search capabilities without the complexity of managing a separate search service. The result, rich search functionality integrated seamlessly with ACID-compliant storage.

library_books Documentation code Query Syntax

storage

Hybrid Storage

Featured Storage

Goran C. Mar 12, 2026 5 min read

Redb + Tantivy: The Atomic Dual-Engine at CameoDB's Core

ACID-compliant KV storage meets full-text search in a single atomic transaction. Explore how CameoDB's hybrid architecture delivers consistency, durability, and rich query capabilities.

Storage ACID Hybrid

arrow_forward

The Hybrid Storage Philosophy

Most databases choose one storage model, relational tables for ACID guarantees, or search engines for rich query capabilities. CameoDB combines both in a single atomic transaction, delivering ACID-compliant key-value storage alongside full-text search without sacrificing consistency or performance.

The hybrid architecture leverages two complementary storage engines, redb for durable KV storage with ACID guarantees, and Tantivy for full-text inverted indexes. Every write operation touches both engines atomically, ensuring data consistency across storage and search.

Atomic Write Sequence

Each write operation follows a strict seven-step sequence to guarantee atomicity across both storage engines:

// 1. Generate sequence ID (AtomicU64, lock-free)
seq_id = counter.fetch_add(1)

// 2. Begin redb transaction (ACID isolation)
txn = kv.begin_write()

// 3. Write to WAL (durability before application)
wal.insert(seq_id, serialized_op)

// 4. Write to data table (complete JSON document)
data.insert(id, json_blob)

// 5. Update Tantivy index (in-memory buffer)
writer.add_document(id_only_doc)

// 6. Commit redb transaction (fsync if configured)
txn.commit()

// 7. Signal supervisor for smart commit
supervisor.reset_timer()

The sequence ID ensures monotonic ordering. The WAL (write-ahead log) guarantees durability before the operation is applied. The data table stores the complete JSON document. Tantivy receives only the indexed fields. The redb transaction commits with optional fsync for maximum durability. Finally, the supervisor is signaled for smart commit management.

ACID Guarantees Across Engines

CameoDB provides full ACID guarantees across the hybrid storage system:

Atomicity

All-or-nothing writes. If any step fails, the entire operation rolls back with no partial state.

Consistency

Both stores always reflect the same logical state. Invariants are maintained across KV and search.

Isolation

Concurrent operations don't interfere. redb's MVCC ensures safe parallel reads and writes.

Durability

WAL ensures operations survive crashes. Configurable fsync provides maximum durability guarantees.

Multi-Tenant Architecture

CameoDB supports multiple isolated indices within a single storage instance. Each index operates as an independent tenant with its own data tables, search indices, and sequence counters. This enables multi-tenancy without resource contention between indices.

Data isolation is complete, per-index `data_{index}` and `wal_{index}` tables in redb, per-index `indices/{index_name}/` directories for Tantivy. Sequence counters are independent per index, enabling parallel write scaling without lock contention.

Recovery and Consistency

On startup, CameoDB performs consistent hybrid recovery. The system reads uncommitted records from the redb WAL and replays them into the Tantivy search index. This ensures that any operations committed to redb but not yet flushed to Tantivy are recovered, maintaining consistency between the two storage engines.

The recovery process is automatic and requires no manual intervention. It handles crash scenarios, power failures, and unclean shutdowns, guaranteeing that the search index eventually reflects all committed writes.

Performance Characteristics

The hybrid architecture delivers balanced performance across operations:

Point Queries

~0.1ms (redb B-tree)

Fast KV lookup by document ID

Range Queries

1-10ms (depends on range size)

Efficient B-tree traversal

Search Queries

10-100ms (Tantivy)

Depends on index size and complexity

Batch Writes

0.05-0.5ms per operation

10-60x faster than single ops

The Takeaway

CameoDB's hybrid storage architecture combines the strengths of two specialized engines. redb provides ACID-compliant KV storage with fast point lookups and range queries. Tantivy provides rich full-text search with inverted indexes and relevance scoring. The atomic write sequence guarantees consistency across both engines. The result is a database that delivers both transactional integrity and search capabilities in a single system, without the complexity of managing separate storage and search infrastructure.

library_books Storage Docs rocket_launch Quickstart

rocket_launch

Release v0.2.2

Featured Release

Goran C. Mar 5, 2026 4 min read

CameoDB v0.2.2: What's New in the Latest Stable Release

Performance improvements, new query operators, enhanced distributed stability, and expanded platform support. Discover the highlights of CameoDB v0.2.2.

Release v0.2.2 Stable

arrow_forward

Performance Improvements

v0.2.2 brings significant performance gains across the board. Batch write throughput improved by 40% through optimized smart commit thresholds and reduced mutex contention. Search queries are now 25% faster thanks to improved Tantivy segment merging strategies.

Batch Writes

40% faster

Optimized commit thresholds and reduced lock contention

Search Queries

25% faster

Improved segment merging and cache strategies

Point Queries

15% faster

B-tree optimization and reduced allocation overhead

Memory Usage

20% lower

Improved tiered cache release and buffer management

New Query Operators

This release adds powerful new query operators for more flexible data retrieval:

// IN operator for set membership
query = status: IN [active pending review]

// Range comparisons with inclusive/exclusive
query = score:{100 TO 500}  // Exclusive
query = score:[100 TO 500]  // Inclusive

// Date comparisons
query = created_at:>2024-01-01
query = created_at:<=2024-12-31

The IN operator enables efficient set-based filtering without multiple OR clauses. Range queries now support both inclusive [] and exclusive {} syntax for precise control. Date comparisons use natural date formats for better readability.

Enhanced Distributed Stability

The distributed mesh now handles network partitions and node failures more gracefully. Improved Kademlia DHT routing reduces lookup latency by 30%. Peer discovery is more robust with exponential backoff and connection retry strategies.

Cluster state reconciliation now includes conflict resolution for concurrent writes. The system maintains a conflict-free replicated data type (CRDT) approach for metadata, ensuring eventual consistency across all nodes.

Expanded Platform Support

v0.2.2 adds official support for additional platforms:

Linux ARM64

Native ARM64 builds for Raspberry Pi, AWS Graviton, and other ARM servers.

macOS ARM64

Apple Silicon (M1/M2/M3) native binaries with Rosetta fallback for Intel apps.

FreeBSD 13+

Experimental FreeBSD support with full feature parity on x86_64.

API Enhancements

The HTTP API now includes comprehensive OpenAPI 3.0 documentation. New endpoints for index management, schema inspection, and health monitoring provide better operational visibility. Batch operations now support up to 10,000 documents per request.

Bug Fixes

This release includes fixes for 15 reported issues:

Fixed memory leak in tiered cache release during multi-shard operations
Resolved race condition in supervised commit timer reset
Fixed Tantivy index corruption on crash recovery with large documents
Corrected BM25 scoring for documents with multiple term matches
Fixed schema evolution edge case with nested JSON objects
Resolved connection pool exhaustion under high concurrent load
Fixed distributed query timeout handling in scatter-gather
Corrected WAL replay order for consistent recovery

Upgrade Path

Upgrading from v0.2.0 or v0.2.1 is straightforward. Stop the server, replace the binary, and restart. The storage format is backward compatible, no migration required. For v0.1.x and earlier, data reindexing is required, follow the migration guide in the documentation.

Get v0.2.2

Download the latest stable release from the downloads page. Binary packages are available for Linux, macOS, and Windows. Source code is available on GitHub under the FSL-1.1-ALv2 License, with Apache 2.0 and MIT licensing for client interfaces.

download Download v0.2.2 code View on GitHub library_books Upgrade Guide

security

Transaction Model

Featured Security

Goran C. Feb 26, 2026 8 min read

ACID Guarantees in a Leaderless World: CameoDB's Transaction Model

How CameoDB delivers ACID guarantees across a distributed leaderless mesh without consensus algorithms. Explore atomic transactions, isolation levels, and durability in a decentralized architecture.

Security ACID Transactions

arrow_forward

The ACID Challenge in Leaderless Systems

Traditional databases achieve ACID guarantees through consensus algorithms like Raft or Paxos. These protocols elect a leader to serialize writes, ensuring consistency across replicas. But consensus comes with costs, leader election latency, quorum requirements, and limited scalability.

CameoDB takes a different approach. Instead of global consensus, it provides ACID guarantees at the storage engine level through redb's transactional model. Each node maintains ACID compliance for local writes, while the distributed layer handles eventual consistency across the cluster. This tradeoff delivers local transactional integrity with global scalability.

Atomicity: All-or-Nothing Writes

Atomicity ensures that a transaction is either fully applied or not applied at all. CameoDB guarantees atomicity through redb's MVCC (multi-version concurrency control) transaction system. Every write operation occurs within a redb transaction, and if any step fails, the entire transaction rolls back.

// Atomic write with rollback on failure
txn = db.begin_write()
result = match execute_write_sequence(txn) {
    Ok(()) => txn.commit(),
    Err(e) => txn.abort(),  // Rolls back all changes
}

The seven-step write sequence (sequence ID, WAL, data table, Tantivy index, commit) executes atomically. If the Tantivy index update fails, the redb transaction aborts, leaving the WAL and data table untouched. No partial state, no orphaned writes.

Consistency: Invariants Across Storage

Consistency ensures that the database transitions from one valid state to another. CameoDB maintains consistency through schema validation and cross-engine synchronization. The storage engine enforces type constraints, field requirements, and relationship invariants.

The hybrid architecture maintains consistency between redb and Tantivy. The atomic write sequence ensures that both storage engines reflect the same logical state after each commit. The recovery process replays uncommitted WAL records into Tantivy, guaranteeing eventual consistency after crashes.

Isolation: Concurrent Operation Safety

Isolation ensures that concurrent transactions don't interfere with each other. CameoDB provides isolation through redb's MVCC system. Readers operate on snapshots, seeing a consistent view of the database at a point in time, unaffected by concurrent writes.

The system supports multiple isolation levels:

Read Committed

Readers see only committed data. Uncommitted transaction changes are invisible.

Repeatable Read

Readers see a consistent snapshot throughout the transaction, even if other commits occur.

Writes are serialized through the storage engine's transaction manager. Concurrent writes to the same document are serialized by redb's locking mechanism, preventing lost updates and race conditions.

Durability: Surviving Failures

Durability ensures that committed transactions survive failures. CameoDB provides durability through the write-ahead log (WAL) and configurable fsync behavior. Every write operation is first recorded in the WAL before being applied to the data tables.

// WAL ensures durability before application
wal.insert(seq_id, serialized_op)  // Durable
data.insert(id, json_blob)       // Applied
writer.add_document(...)         // Indexed
txn.commit()                     // With optional fsync

The WAL survives crashes, power failures, and unclean shutdowns. On recovery, CameoDB replays uncommitted WAL records to restore the database to a consistent state. Configurable fsync allows you to choose between maximum durability (fsync on every commit) or higher throughput (fsync on smart commit intervals).

Distributed Consistency Model

CameoDB's distributed layer provides eventual consistency across the cluster. Writes are routed to the node that owns the shard based on consistent hashing. Each node maintains ACID compliance for its local writes. Changes propagate to other nodes through the gossip-based peer discovery and state synchronization.

This model means that read-your-writes consistency requires using routing keys. If you write a document with a routing key, subsequent reads with the same key route to the same node, ensuring you see your write. Without routing keys, reads may return stale data from other nodes until gossip synchronization completes.

Conflict Resolution

In a leaderless system, concurrent writes to the same document on different nodes can create conflicts. CameoDB uses a conflict-free replicated data type (CRDT) approach for metadata, ensuring eventual consistency without coordination. For user data, the system relies on the application layer to handle conflicts through version vectors or application-specific resolution strategies.

The sequence ID provides a total order for writes within a single node. Combined with timestamps and node IDs, the system can detect and report conflicts to the application layer for resolution.

The Tradeoffs

CameoDB's transaction model makes specific tradeoffs:

Strong Local ACID

Full ACID on single node
Fast local transactions
No leader election overhead
Predictable latency

Eventual Global Consistency

Read-your-writes requires routing keys
Conflicts need application resolution
Stale reads possible without keys
Gossip propagation delay

The Takeaway

CameoDB delivers ACID guarantees through redb's transactional storage engine, providing strong consistency at the node level. The distributed layer provides eventual consistency across the cluster, enabling linear scalability without consensus overhead. The result is a database that combines transactional integrity for local operations with global scalability for distributed workloads. Use routing keys for read-your-writes consistency, or embrace eventual consistency for maximum throughput.

library_books Transaction Docs rocket_launch Quickstart