AI Search Tool Benchmarks — fff vs codedb vs ripgrep

01 — Head to Head

fff vs codedb — pre-indexed MCP servers

Both tools index once and serve queries from memory. codedb uses trigram + inverted word indexes (Zig). fff uses mmap + SIMD grep (Rust). Median latency over 20 runs via MCP JSON-RPC.

function — common word

codedb0.090ms

fff1.413ms

addEventListener — identifier

codedb0.065ms

fff0.519ms

Durable Objects — rare multi-word

codedb0.129ms

fff0.371ms

handleRequest — camelCase

codedb0.090ms

fff0.399ms

export (*.ts filter)

codedb0.148ms

fff0.377ms

Average speedup

codedb2.5–16x

faster on raw lookup latency

Query	codedb	fff	rg	grep	codedb vs rg	fff vs rg
`function`	0.090ms	1.413ms	179.03ms	14,991ms	1,989x	127x
`addEventListener`	0.065ms	0.519ms	172.78ms	14,224ms	2,658x	333x
`Durable Objects`	0.129ms	0.371ms	173.25ms	13,825ms	1,343x	467x
`handleRequest`	0.090ms	0.399ms	172.93ms	14,342ms	1,921x	433x
`*.ts export`	0.148ms	0.377ms	19.88ms	50.71ms	134x	53x

02 — Content Search

All four tools compared

Log-scaled bars showing the full range from sub-millisecond indexed search to 15-second GNU grep. Each bar group shows codedb, fff, ripgrep, and GNU grep.

function

Very common token — thousands of matches

codedb

0.090ms

fff

1.413ms

rg

179.03ms

grep

14,991ms

addEventListener

Specific identifier

codedb

0.065ms

fff

0.519ms

rg

172.78ms

grep

14,224ms

handleRequest

camelCase identifier — identical match count across all tools

codedb

0.090ms

fff

0.399ms

rg

172.93ms

grep

14,342ms

Durable Objects

Rare multi-word phrase

codedb

0.129ms

fff

0.371ms

rg

173.25ms

grep

13,825ms

03 — File Finding

fff vs fd vs find

fff uses fuzzy matching on its in-memory index. fd and find traverse the filesystem. codedb has codedb_tree but no equivalent fuzzy file finder.

Query	fff	fd	find	fff vs fd
`wrangler.toml`	1.31ms	20.55ms	71.97ms	16x
`config`	1.05ms	79.71ms	70.79ms	76x
`src/content/workers`	1.17ms	4.46ms	4.02ms	4x
`*.mdx`	0.49ms	22.08ms	79.64ms	45x
`wranglerconf` (typo)	1.21ms	20.05ms	64.97ms	17x
`ts`	0.80ms	21.30ms	59.48ms	27x

fff found 587 results for the typo "wranglerconf" vs fd's 2 and find's 1.

04 — MCP Server Latency

End-to-end agent experience

What AI agents actually see: JSON-RPC over stdio, including serialization overhead. fff (left) and codedb-only features (right).

fff MCP

grep · common

0.46ms

0.42 – 1.01ms

grep · identifier

0.62ms

0.49 – 0.68ms

find_files · exact

1.01ms

0.89 – 1.24ms

multi_grep · 3 pat

0.74ms

0.61 – 0.95ms

codedb MCP

symbol find

0.078ms

0.05 – 0.20ms

file outline

0.056ms

0.05 – 0.10ms

file tree

0.070ms

0.05 – 0.20ms

hot files

0.050ms

0.04 – 0.30ms

05 — How They Work

Fundamentally different architectures

These two tools solve the same problem — fast code search for AI agents — with radically different engineering choices. Understanding the architecture explains every number above.

Startup Cost — 11,500 files / 1.6 GB repo

Tool	Cold Start	Warm Start	What Happens
fff	72–92ms	72–92ms	Walk git index, mmap every file into virtual memory
codedb	7,200ms	585ms	Parse every file, build trigram + word index + outlines + dep graph; or load snapshot
ripgrep	0ms	0ms	No index — walks filesystem and reads files on every query

fff's startup is 78x faster than codedb cold start, 6x faster than codedb warm start. Both are one-time costs amortized across the session. ripgrep pays its cost on every query instead.

fff — mmap + SIMD scan

No traditional index

fff does not build an inverted index, trigram table, or any derived data structure from file contents. On startup, it walks the git index (92ms for 11.5k files) and calls mmap() on each file, asking the OS kernel to map file bytes directly into the process's virtual address space. This is not "reading" the files — no data is copied into userspace yet. The kernel just sets up page table entries.

Query-time: real-time SIMD scan

When you search for addEventListener, fff scans through the mmap'd pages using SIMD vector instructions (ARM NEON / x86 SSE/AVX). It processes 16–32 bytes per CPU cycle, comparing against the search pattern in parallel. Pages that were already touched are in the kernel page cache (RAM); untouched pages trigger a page fault that loads them from disk transparently.

Multi-pattern: Aho-Corasick automaton

multi_grep builds a finite state automaton from all patterns at query time and matches them in a single pass. This is faster than running N separate greps because the automaton only traverses each byte once regardless of pattern count.

Frecency ranking

fff maintains a persistent LMDB database tracking which files an agent opens and how often. Results are scored by a combination of match quality, file access frequency, and recency. Git-dirty files get a boost. This means the first result is usually the right one — agents rarely need to paginate or retry.

Fuzzy fallback & auto-retry

If a grep returns 0 results, fff automatically tries: (1) dropping the first word to broaden the query, (2) fuzzy matching with Smith-Waterman alignment per line, (3) file path fallback if the query looks like a path. This means agents waste fewer roundtrips on failed searches — the tool recovers silently.

codedb — structured index

Heavy upfront index

codedb reads every file and builds multiple derived data structures: an inverted word index (hash map from every word/identifier to its file + line locations), a trigram index (every 3-character substring mapped to containing files), and structural outlines (parsed function/class/import declarations with line numbers). This costs 7.2s cold but is persisted to a codedb.snapshot file for 585ms warm restarts.

Query-time: hash table lookup

Searching for addEventListener via codedb_word is a single hash table lookup — O(1) regardless of repo size. The index already knows every file and line where that exact token appears. No file content is read at query time. This is why it's 4–16x faster than fff's SIMD scan: it's not scanning anything at all.

Substring search: trigram narrowing

For codedb_search (substring/regex), codedb extracts trigrams from the query (e.g., "add", "ddE", "dEv" for "addEventListener"), intersects the file sets for each trigram to get candidates, then does a brute-force scan only on those candidate files. This avoids scanning 99% of files, achieving 5.5x speedup over full scan.

Structural parsing

codedb parses Zig, Python, TypeScript, and JavaScript files to extract function definitions, struct/class declarations, imports, and test blocks with line numbers. The codedb_outline tool returns these symbols for a file — 4–15x fewer tokens than reading the raw file. codedb_symbol finds where a symbol is defined across the codebase. fff has no structural awareness.

Dependency graph

codedb records import/require statements and builds a reverse dependency graph. codedb_deps answers "which files import this file?" — useful for impact analysis before refactoring. This is computed during indexing and has no equivalent in any search tool.

Capability Matrix

Capability	fff	codedb	ripgrep
Content search (grep)	SIMD scan	trigram + brute	regex engine
Exact word lookup	—	O(1) hash	—
Fuzzy / typo-tolerant search	Smith-Waterman	—	—
Regex support	auto-detect	flag	native
Multi-pattern OR	Aho-Corasick	—	—
File finding (fuzzy)	frecency-ranked	tree only	—
Frecency ranking	LMDB-backed	—	—
Auto-retry on 0 results	3 fallbacks	—	—
Definition auto-expand	context lines	outlines	—
Symbol / structural search	—	parsed AST	—
Dependency graph	—	reverse deps	—
Atomic file edits	—	versioned	—
Git-dirty file boosting	in ranking	—	—
Persistent file watching	fsevents	2s polling	—
Language	Rust	Zig	Rust

06 — When to Use What

Picking the right tool

fff — when speed-to-first-result matters

92ms cold start means agents can begin searching almost instantly. Fuzzy matching and auto-retry reduce failed roundtrips. Frecency ranking surfaces the most relevant file first. Best for: short-lived agent sessions, exploratory search, typo-tolerant file discovery, or when you don't know the exact identifier.

codedb — when structural understanding matters

7.2s index cost pays off in sessions that need symbol lookup, dependency analysis, or file outlines. O(1) word lookup is fastest for exact identifiers. Best for: long-running agent sessions, refactoring workflows, impact analysis, or when you need to understand code structure (not just find strings).

ripgrep — when you want zero state

No index, no server, no persistent process. Pays 175ms per query but never needs startup time. Best for: one-off searches, CI/CD scripts, or environments where running a daemon is impractical. Still 80–100x faster than GNU grep.

Both together — for maximum coverage

They're complementary, not competing. An agent could use fff for fuzzy file discovery and content grep, and codedb for symbol definitions, outlines, and dependency analysis. The MCP protocol means both can run as separate servers simultaneously.