2026-04-05 macOS / Apple Silicon 11,500 files / 1.6 GB

AI search tools, benchmarked

fff, codedb, ripgrep, grep, fd, and find tested head-to-head on the Cloudflare docs repo. All pre-indexed tools use warm MCP servers. All filesystem tools use warm OS page cache.

0.09ms
codedb word index
fastest raw lookup
0.40ms
fff grep median
SIMD content search
175ms
ripgrep median
filesystem traversal
14,500ms
GNU grep median
sequential scan

01 — Head to Head

fff vs codedb — pre-indexed MCP servers

Both tools index once and serve queries from memory. codedb uses trigram + inverted word indexes (Zig). fff uses mmap + SIMD grep (Rust). Median latency over 20 runs via MCP JSON-RPC.

function — common word
codedb0.090ms
fff1.413ms
addEventListener — identifier
codedb0.065ms
fff0.519ms
Durable Objects — rare multi-word
codedb0.129ms
fff0.371ms
handleRequest — camelCase
codedb0.090ms
fff0.399ms
export (*.ts filter)
codedb0.148ms
fff0.377ms
Average speedup
codedb2.5–16x
faster on raw lookup latency
Query codedb fff rg grep codedb vs rg fff vs rg
function0.090ms1.413ms179.03ms14,991ms1,989x127x
addEventListener0.065ms0.519ms172.78ms14,224ms2,658x333x
Durable Objects0.129ms0.371ms173.25ms13,825ms1,343x467x
handleRequest0.090ms0.399ms172.93ms14,342ms1,921x433x
*.ts export0.148ms0.377ms19.88ms50.71ms134x53x

02 — Content Search

All four tools compared

Log-scaled bars showing the full range from sub-millisecond indexed search to 15-second GNU grep. Each bar group shows codedb, fff, ripgrep, and GNU grep.

function
Very common token — thousands of matches
codedb
0.090ms
fff
1.413ms
rg
179.03ms
grep
14,991ms
addEventListener
Specific identifier
codedb
0.065ms
fff
0.519ms
rg
172.78ms
grep
14,224ms
handleRequest
camelCase identifier — identical match count across all tools
codedb
0.090ms
fff
0.399ms
rg
172.93ms
grep
14,342ms
Durable Objects
Rare multi-word phrase
codedb
0.129ms
fff
0.371ms
rg
173.25ms
grep
13,825ms

03 — File Finding

fff vs fd vs find

fff uses fuzzy matching on its in-memory index. fd and find traverse the filesystem. codedb has codedb_tree but no equivalent fuzzy file finder.

Queryffffdfindfff vs fd
wrangler.toml1.31ms20.55ms71.97ms16x
config1.05ms79.71ms70.79ms76x
src/content/workers1.17ms4.46ms4.02ms4x
*.mdx0.49ms22.08ms79.64ms45x
wranglerconf (typo)1.21ms20.05ms64.97ms17x
ts0.80ms21.30ms59.48ms27x

fff found 587 results for the typo "wranglerconf" vs fd's 2 and find's 1.


04 — MCP Server Latency

End-to-end agent experience

What AI agents actually see: JSON-RPC over stdio, including serialization overhead. fff (left) and codedb-only features (right).

fff MCP
grep · common
0.46ms
0.42 – 1.01ms
grep · identifier
0.62ms
0.49 – 0.68ms
find_files · exact
1.01ms
0.89 – 1.24ms
multi_grep · 3 pat
0.74ms
0.61 – 0.95ms
codedb MCP
symbol find
0.078ms
0.05 – 0.20ms
file outline
0.056ms
0.05 – 0.10ms
file tree
0.070ms
0.05 – 0.20ms
hot files
0.050ms
0.04 – 0.30ms

05 — How They Work

Fundamentally different architectures

These two tools solve the same problem — fast code search for AI agents — with radically different engineering choices. Understanding the architecture explains every number above.

Startup Cost — 11,500 files / 1.6 GB repo
Tool Cold Start Warm Start What Happens
fff 72–92ms 72–92ms Walk git index, mmap every file into virtual memory
codedb 7,200ms 585ms Parse every file, build trigram + word index + outlines + dep graph; or load snapshot
ripgrep 0ms 0ms No index — walks filesystem and reads files on every query

fff's startup is 78x faster than codedb cold start, 6x faster than codedb warm start. Both are one-time costs amortized across the session. ripgrep pays its cost on every query instead.

fff — mmap + SIMD scan
No traditional index
fff does not build an inverted index, trigram table, or any derived data structure from file contents. On startup, it walks the git index (92ms for 11.5k files) and calls mmap() on each file, asking the OS kernel to map file bytes directly into the process's virtual address space. This is not "reading" the files — no data is copied into userspace yet. The kernel just sets up page table entries.
Query-time: real-time SIMD scan
When you search for addEventListener, fff scans through the mmap'd pages using SIMD vector instructions (ARM NEON / x86 SSE/AVX). It processes 16–32 bytes per CPU cycle, comparing against the search pattern in parallel. Pages that were already touched are in the kernel page cache (RAM); untouched pages trigger a page fault that loads them from disk transparently.
Multi-pattern: Aho-Corasick automaton
multi_grep builds a finite state automaton from all patterns at query time and matches them in a single pass. This is faster than running N separate greps because the automaton only traverses each byte once regardless of pattern count.
Frecency ranking
fff maintains a persistent LMDB database tracking which files an agent opens and how often. Results are scored by a combination of match quality, file access frequency, and recency. Git-dirty files get a boost. This means the first result is usually the right one — agents rarely need to paginate or retry.
Fuzzy fallback & auto-retry
If a grep returns 0 results, fff automatically tries: (1) dropping the first word to broaden the query, (2) fuzzy matching with Smith-Waterman alignment per line, (3) file path fallback if the query looks like a path. This means agents waste fewer roundtrips on failed searches — the tool recovers silently.
codedb — structured index
Heavy upfront index
codedb reads every file and builds multiple derived data structures: an inverted word index (hash map from every word/identifier to its file + line locations), a trigram index (every 3-character substring mapped to containing files), and structural outlines (parsed function/class/import declarations with line numbers). This costs 7.2s cold but is persisted to a codedb.snapshot file for 585ms warm restarts.
Query-time: hash table lookup
Searching for addEventListener via codedb_word is a single hash table lookup — O(1) regardless of repo size. The index already knows every file and line where that exact token appears. No file content is read at query time. This is why it's 4–16x faster than fff's SIMD scan: it's not scanning anything at all.
Substring search: trigram narrowing
For codedb_search (substring/regex), codedb extracts trigrams from the query (e.g., "add", "ddE", "dEv" for "addEventListener"), intersects the file sets for each trigram to get candidates, then does a brute-force scan only on those candidate files. This avoids scanning 99% of files, achieving 5.5x speedup over full scan.
Structural parsing
codedb parses Zig, Python, TypeScript, and JavaScript files to extract function definitions, struct/class declarations, imports, and test blocks with line numbers. The codedb_outline tool returns these symbols for a file — 4–15x fewer tokens than reading the raw file. codedb_symbol finds where a symbol is defined across the codebase. fff has no structural awareness.
Dependency graph
codedb records import/require statements and builds a reverse dependency graph. codedb_deps answers "which files import this file?" — useful for impact analysis before refactoring. This is computed during indexing and has no equivalent in any search tool.
Capability Matrix
Capability fff codedb ripgrep
Content search (grep)SIMD scantrigram + bruteregex engine
Exact word lookupO(1) hash
Fuzzy / typo-tolerant searchSmith-Waterman
Regex supportauto-detectflagnative
Multi-pattern ORAho-Corasick
File finding (fuzzy)frecency-rankedtree only
Frecency rankingLMDB-backed
Auto-retry on 0 results3 fallbacks
Definition auto-expandcontext linesoutlines
Symbol / structural searchparsed AST
Dependency graphreverse deps
Atomic file editsversioned
Git-dirty file boostingin ranking
Persistent file watchingfsevents2s polling
LanguageRustZigRust

06 — When to Use What

Picking the right tool

fff — when speed-to-first-result matters
92ms cold start means agents can begin searching almost instantly. Fuzzy matching and auto-retry reduce failed roundtrips. Frecency ranking surfaces the most relevant file first. Best for: short-lived agent sessions, exploratory search, typo-tolerant file discovery, or when you don't know the exact identifier.
codedb — when structural understanding matters
7.2s index cost pays off in sessions that need symbol lookup, dependency analysis, or file outlines. O(1) word lookup is fastest for exact identifiers. Best for: long-running agent sessions, refactoring workflows, impact analysis, or when you need to understand code structure (not just find strings).
ripgrep — when you want zero state
No index, no server, no persistent process. Pays 175ms per query but never needs startup time. Best for: one-off searches, CI/CD scripts, or environments where running a daemon is impractical. Still 80–100x faster than GNU grep.
Both together — for maximum coverage
They're complementary, not competing. An agent could use fff for fuzzy file discovery and content grep, and codedb for symbol definitions, outlines, and dependency analysis. The MCP protocol means both can run as separate servers simultaneously.