Documentation · v0.1.0

doora
Search code by shape,
not by text.

A high-performance structural code search engine built on Tree-sitter. Where grep sees a flat river of bytes, doora sees a grammar — functions, types, scopes, and relationships.

Rust Tree-sitter 7 languages MCP server Rayon parallel Bloom filter index
$ cargo install doora

01Overview

doora is a command-line tool that parses source files into Abstract Syntax Trees and executes structural pattern queries against them. It is fundamentally different from text search: it understands the grammar of your code, not just its characters.

Every text-based search tool — grep, ripgrep, ack — treats source code as a string. They cannot tell the difference between a function named authenticate and a comment that mentions authenticate. They cannot find "all functions that take exactly two arguments" or "all unwrap() calls outside test modules."

doora answers all of these queries instantly, across millions of lines of code, using Tree-sitter's incremental parsing library at its core.

Structural patterns

Find code by its syntactic shape using Tree-sitter S-expression queries.

Parallel processing

Lock-free work-stealing thread pool via Rayon. Scales to all available cores.

7 languages

Rust, Python, JavaScript, TypeScript, Go, C, and C++ with auto-detection.

Bloom filter index

Pre-reject files that cannot possibly match before invoking the parser.

Semantic rewriting

Replace structural patterns without corrupting surrounding syntax.

MCP server

Expose codebase structure to LLM agents via the Model Context Protocol.


02Installation

doora is distributed as a single static binary. Install via Cargo, download a pre-built binary, or build from source.

From crates.io

# Requires Rust 1.78+
cargo install doora

Pre-built binaries

Download from the Releases page. Binaries are available for:

PlatformArchitectureBinary
Linuxx86_64doora-x86_64-unknown-linux-gnu
Linuxaarch64doora-aarch64-unknown-linux-gnu
macOSx86_64doora-x86_64-apple-darwin
macOSApple Silicondoora-aarch64-apple-darwin
Windowsx86_64doora-x86_64-pc-windows-msvc.exe

From source

git clone https://github.com/backpack-lab/doora
cd doora
cargo build --release
# Binary at ./target/release/doora

Shell completions

# Bash
doora --generate-completions bash >> ~/.bashrc

# Zsh
doora --generate-completions zsh >> ~/.zshrc

# Fish
doora --generate-completions fish > ~/.config/fish/completions/doora.fish

03Quick start

Run your first structural query in under 60 seconds.

Find all function definitions

terminal
doora -q '(function_item name: (identifier) @fn_name)' -p ./src

src/auth/handler.rs:42:0 [@fn_name] "parse_token"
src/auth/handler.rs:89:0 [@fn_name] "validate_session"
src/db/pool.rs:14:0 [@fn_name] "connect"
...

Found 47 matches across 23 files in 38ms

Find a specific function by name

doora \
  -q '(function_item name: (identifier) @fn (#eq? @fn "connect"))' \
  -p ./src

Search multiple languages at once

doora \
  -q '(function_declaration name: (identifier) @fn_name)' \
  -p ./src
# auto-detects language per file

Multiple queries in one pass

doora \
  -q '(function_item name: (identifier) @fn_name)' \
  -q '(struct_item name: (type_identifier) @struct_name)' \
  -p ./src --no-color

Launch the interactive TUI

doora -q '(function_item)' -p . --tui

Build a search index for faster queries

doora index ./src --verbose
# Subsequent searches use Bloom filter pre-rejection

04Why structural search

Text search has fundamental limitations that no amount of regex cleverness can overcome. Structural search solves them categorically.

Querygrep / ripgrepdoora
Find function definitions named auth_user Returns ALL occurrences — function defs, variable names, comments, string literals, dead code Returns only function definition nodes named auth_user
Find functions taking exactly 2 arguments Cannot be expressed reliably in regex Trivial: query the parameter list node child count
Find all unwrap() outside test modules Cannot express scope constraints Query for call sites with scope predicates
Find struct definitions that implement a trait Multi-step, fragile, many false positives Single S-expression query
Rename a function everywhere it is defined Risk of corrupting string literals and comments Semantic rewriting via AST: only actual definition nodes
The key insight

doora is to grep what a SQL database is to a flat text file. Both store the same data; one understands its structure.


05How AST search works

Source code has grammar. Every language defines how tokens nest into expressions, expressions into statements, and statements into functions and modules. doora makes this grammar queryable.

From bytes to tree

Consider this Rust function:

fn authenticate(user: &str, password: &str) -> bool {
    true
}

Tree-sitter parses this into a Concrete Syntax Tree (CST):

Syntax tree structure
source_file
  └─ function_item
      ├─ name: identifier "authenticate"
      ├─ parameters: parameters
      │      ├─ parameter
      │      │      ├─ pattern: identifier "user"
      │      │      └─ type: reference_type
      │      └─ parameter
      │            ├─ pattern: identifier "password"
      │            └─ type: reference_type
      ├─ return_type: primitive_type "bool"
      └─ body: block

An S-expression query navigates this tree structure. The query (function_item name: (identifier) @fn) matches any function_item node that has a name child of type identifier, and captures that identifier as @fn.

CST vs AST

Traditional compilers produce Abstract Syntax Trees (ASTs) that discard whitespace, comments, and punctuation. doora uses Concrete Syntax Trees (CSTs) that retain 100% fidelity to the original source — including exact byte offsets for every token. This is required for:


06Query syntax overview

Queries use Tree-sitter's S-expression pattern syntax — a Lisp-like notation that mirrors the shape of the syntax tree.

Basic node match

(node_type)

Matches any node of the given type anywhere in the tree. Node type names are grammar-specific — see the language sections for common types.

Named capture

(function_item name: (identifier) @my_capture)

The @my_capture syntax tags a sub-node for extraction. Captured text appears in the output. Multiple captures per query are supported.

Equality predicate

(function_item
  name: (identifier) @fn
  (#eq? @fn "connect"))

Matches only when the captured node's text equals "connect" exactly.

Regex predicate

(function_item
  name: (identifier) @fn
  (#match? @fn "^(get|set|update)_"))

Matches function names starting with get_, set_, or update_.

Wildcard child

(call_expression
  function: (identifier) @callee
  arguments: (arguments . (_) .))

The (_) wildcard matches any single node. Use . as an anchor.

Multi-query (single pass)

doora \
  -q '(function_item name: (identifier) @fn_name)' \
  -q '(struct_item name: (type_identifier) @struct_name)' \
  -p ./src

Multiple -q flags are compiled into a single-pass automaton. The AST is traversed exactly once regardless of query count.


07CLI — search command

The default command. When no subcommand is given, doora runs a structural search.

usage
doora [search] [OPTIONS] --query <S-EXPR>

Arguments

FlagTypeDefaultDescription
-q, --query <S-EXPR> String (repeatable) required S-expression query pattern. Pass multiple times for multi-query single-pass search.
-p, --path <DIR> PathBuf "." Root directory to search. Must be an existing directory.
-l, --lang <LANG> String "auto" Language to parse. One of: rust, python, js, ts, go, c, cpp, auto.
--no-color bool false Disable ANSI color output. Also honored via the NO_COLOR environment variable.
-Q, --quiet bool false Suppress per-match result lines. Only the summary line is printed.
--stats bool false Print detailed performance diagnostics to stderr after the search completes.
--no-update-index bool false Disable automatic incremental index updates during search.

Output format

stdout — one line per match
src/auth/handler.rs:42:0 [@fn_name] "parse_token"

Fields: filepath:line:col   [@capture_name]   "matched_text"

stderr — summary line
Found 47 matches across 23 files in 38ms

Stats output

stderr — with --stats flag
--- search statistics ---
files walked: 47
files parsed: 46
files skipped: 1
matches found: 12
sieve rejected: 18
match rate: 26.09% (files with matches / files parsed)
wall time: 38ms
throughput: 1236.84 files/sec

08CLI — index command

Explicitly builds or updates the Bloom filter index for a directory. Once indexed, search queries with string literals run significantly faster.

usage
doora index <PATH> [OPTIONS]
FlagTypeDefaultDescription
<PATH> PathBuf required Root directory to index.
--lang <LANG> String "auto" Language filter. Defaults to auto-detect all supported extensions.
--verbose bool false Print one line per file: indexed:, fresh:, or removed: prefix.
example output (verbose)
indexed: src/auth/handler.rs
indexed: src/db/pool.rs
fresh: src/main.rs
removed: src/old/legacy.rs

indexed 44 files, skipped 2 fresh, removed 1 stale entries
index written to .doora-index

The index is stored at <PATH>/.doora-index as a binary file serialized with bincode. It is a manifest of per-file Bloom filter entries, each containing:

Incremental updates

Both the explicit index command and the search command perform incremental updates. Files whose mtime and size match the stored entry are skipped. New and modified files are re-indexed. Deleted files are removed from the manifest automatically.


09Query guide — basics

A complete reference for writing effective S-expression queries.

Node types

Every language has a set of named node types defined by its Tree-sitter grammar. The node type is the first element of an S-expression. Common node types vary by language:

LanguageFunctionsClasses / TypesVariables
Rustfunction_itemstruct_item, impl_itemlet_declaration
Pythonfunction_definitionclass_definitionassignment
JavaScriptfunction_declarationclass_declarationvariable_declaration
TypeScriptfunction_declarationinterface_declaration, type_alias_declarationlexical_declaration
Gofunction_declarationtype_declarationshort_var_declaration
C / C++function_definitionstruct_specifier, class_specifierdeclaration

Field names

Named fields constrain which child to match. The syntax is field_name: (child_type):

(function_item
  name: (identifier) @fn
  parameters: (parameters) @params
  return_type: (_) @ret)

Nested patterns

Patterns can nest to arbitrary depth:

(impl_item
  type: (type_identifier) @impl_type
  body: (declaration_list
    (function_item name: (identifier) @method_name)))

10Query guide — predicates

Predicates filter captures based on their text content. They are evaluated using pre-compiled regex objects, cached at query compile time.

PredicateDescriptionExample
#eq? Exact equality match (#eq? @fn "connect")
#match? Regex match (anchored with ^ and $ as needed) (#match? @fn "^handle_")
#not-eq? Negative equality (#not-eq? @fn "main")
#any-of? Match any string in a list (#any-of? @fn "get" "set" "del")
Performance note

Regex predicates are compiled exactly once at query compile time and wrapped in Arc<Regex>. They are passed by reference to all parallel worker threads — zero per-file regex compilation overhead.


11Query examples by language

A practical library of common patterns for each supported language.

Rust

rustAll function definitions
(function_item name: (identifier) @fn_name)
Matches every fn declaration — public, private, async, const, unsafe.
rustAll unwrap() call sites
(call_expression
  function: (field_expression
    field: (field_identifier) @m (#eq? @m "unwrap")))
Finds every .unwrap() call regardless of the receiver type.
rustStruct definitions
(struct_item name: (type_identifier) @struct_name)
rustTrait implementations for a specific type
(impl_item
  trait: (type_identifier) @trait
  type: (type_identifier) @impl_type
  (#eq? @impl_type "MyStruct"))
rustFunctions matching a naming pattern
(function_item
  name: (identifier) @fn
  (#match? @fn "^(get|set|update)_"))
Finds all functions whose names start with get_, set_, or update_.

Python

pythonFunction definitions
(function_definition name: (identifier) @fn_name)
pythonDecorated functions
(decorated_definition
  decorator: (decorator) @decorator
  definition: (function_definition name: (identifier) @fn_name))
Captures both the decorator and function name.
pythonClass definitions
(class_definition name: (identifier) @class_name)

JavaScript

jsFunction declarations
(function_declaration name: (identifier) @fn_name)
Matches function foo() {} syntax only. Arrow functions use a different node type.
jsClass declarations
(class_declaration name: (identifier) @class_name)
jsMethod definitions
(method_definition name: (property_identifier) @method_name)

TypeScript

tsInterface declarations
(interface_declaration name: (type_identifier) @interface_name)
tsType aliases
(type_alias_declaration name: (type_identifier) @type_name)

Go

goFunction declarations (not methods)
(function_declaration name: (identifier) @fn_name)
goStruct type declarations
(type_declaration
  (type_spec name: (type_identifier) @type_name))

C / C++

cFunction definitions
(function_definition
  declarator: (function_declarator
    declarator: (identifier) @fn_name))
C and C++ function names are nested inside a declarator chain.
cppClass declarations
(class_specifier name: (type_identifier) @class_name)
cTypedef names
(type_definition declarator: (type_identifier) @type_name)

12Language auto-detection

When --lang auto is used (the default), doora detects the grammar per file from its extension and walks all supported extensions simultaneously.

Language flagExtensions walkedGrammar
rust.rstree-sitter-rust
python.py, .pyitree-sitter-python
js.js, .mjs, .cjstree-sitter-javascript
ts.ts, .tsx, .mts, .ctstree-sitter-typescript (tsx variant)
go.gotree-sitter-go
c.c, .htree-sitter-c
cpp.cpp, .cc, .hpp, .hxx, .cxx, .htree-sitter-cpp
.h file disambiguation

In auto mode, .h files are parsed with the C grammar. To parse them as C++, use --lang cpp explicitly.

In auto mode, the query is compiled against every supported language at startup. Languages for which the query fails to compile (because a node type doesn't exist in that grammar) are silently skipped. Only languages where the query compiles successfully are searched. This means a Rust-only query like (function_item @fn) will search only .rs files even in auto mode.


13Bloom filter index

The index is a pre-parse rejection sieve. Files that mathematically cannot contain the search term are skipped entirely, before tree-sitter is invoked.

How it works

01
Trigram extraction
Every file's source bytes are broken into all consecutive 3-byte windows. "hello" → [hel, ell, llo]. Unique trigrams are inserted into a per-file Bloom filter.
02
Bloom filter construction
A 4096-bit (512-byte) bit array with two FNV-1a hash functions. Each trigram sets 2 bits. The filter is serialized to the index manifest.
03
Query trigram extraction
At search time, string literals in predicates (#eq?, #match?) are decomposed into trigrams. "authenticate" → [aut, uth, the, hent, enti, ntic, tica, icat, cate].
04
Pre-parse rejection
Before invoking tree-sitter on a file, check if all query trigrams are present in the file's filter. If any is absent, skip the file entirely. Under 0.003ms per rejection.
05
Zero false negatives guaranteed
A file containing the search term ALWAYS passes the filter. False positives (files that pass but don't match) result in unnecessary parsing — not missed results.

Bloom filter parameters

ParameterValueRationale
Bit array size4096 bits (512 bytes)~1% false positive rate for ~570 trigrams per file
Hash functions2 (FNV-1a variants)Sweet spot for our trigram density and bit array size
Hash algorithmFNV-1a 32-bitDeterministic across process runs (unlike SipHash with random seeds)

14Semantic rewriting

doora can surgically replace structural patterns without corrupting surrounding syntax. This is fundamentally different from sed — it operates on the AST, not on text.

preview a rename (dry run)
doora -q '(function_item name: (identifier) @fn (#eq? @fn "old_name"))' \
  --rewrite 'new_name' \
  -p ./src
apply rewrites in place
doora -q '(function_item name: (identifier) @fn (#eq? @fn "old_name"))' \
  --rewrite 'new_name' \
  --in-place \
  -p ./src

A unified diff is always printed before --in-place writes, unless --yes is passed. Rewrites are applied in reverse byte order to avoid offset invalidation when multiple matches occur in the same file.

Important

The rewrite template replaces the CAPTURED node's text, not the full match. Capture your target precisely to avoid unintended changes.


15Interactive TUI

Launch with --tui for a split-pane terminal explorer with live streaming results.

launch the TUI
doora -q '(function_item)' -p . --tui
TUI layout
┌─ Files ──────────┐┌─ AST ─────────────────────────────────────┐
│ src/auth/ ││ source_file │
│ handler.rs 3 ││ function_item [42:0 → 58:1] │
│ token.rs 1 ││ name: identifier "parse_token" ← match │
│ src/db/ ││ parameters: parameters │
│ pool.rs 1 ││ parameter: identifier "input" │
└──────────────────┘└───────────────────────────────────────────┘
┌─ Code ─────────────────────────────────────────────────────────┐
│ 42 │ pub fn parse_token(input: &str, ctx: &Context) -> Token { │
└────────────────────────────────────────────────────────────────┘
[/] query [j/k] navigate [Enter] expand node [q] quit

The TUI uses a fully decoupled async event loop (Tokio). Results stream in live as background threads find matches — the terminal never freezes, even on multi-gigabyte repositories.

KeyAction
/Focus the query input bar
j / k or arrowsNavigate the file list
EnterExpand / collapse AST nodes
TabSwitch focus between panes
q or EscQuit

16MCP server mode

doora exposes a Model Context Protocol server so LLM coding agents can query your codebase structurally, reducing hallucinations and wasted context tokens.

Why LLMs need structural search

LLM coding agents have limited context windows. When an agent tries to understand a large codebase by reading files, it burns context tokens on irrelevant content and still has no structural understanding of the architecture.

With the doora MCP server, an agent can ask: "What is the exact type signature of the function handling user authentication?" and receive a precise, structured answer in milliseconds — consuming a fraction of the context tokens that reading source files would require.

Setup

build the index first
doora index --persist -p .
start the MCP server
doora serve --mcp

Configure in .mcp.json

Add to your project's MCP configuration file:

{
  "mcpServers": {
    "doora": {
      "command": "doora",
      "args": ["serve", "--mcp"],
      "cwd": "/path/to/your/repo"
    }
  }
}

Available MCP tools

ToolInputDescription
search_ast query: string Run a live S-expression structural search and return matching results with file paths and positions.
lookup_symbol name: string Query the persisted SQLite index for a symbol by name. Returns type signatures, locations, and relationships.

17Architecture — pipeline overview

The search pipeline is a linear sequence from CLI arguments to printed results. Each stage is isolated and independently testable.

CLI parseCli struct
Validatepath, lang, query
Compile queriesArc<MultiCompiledQuery>
Walk filesWalkBuilder + ignore
par_bridgeRayon parallel
SieveBloom filter check
parse_fileTree-sitter CST
extract_matchesDFS traversal
sort + dedupMatchResult[]
printstdout

Key architectural decisions


18Architecture — module reference

The codebase is organized into focused, independently testable modules.

types
src/types.rs
Core data types: MatchResult, SearchConfig, Language, LangMode, AppError. All shared types live here.
parser
src/parser.rs
Thread-local parser pool, parse_file with mmap support, get_language, detect_language, FileSource enum.
query
src/query.rs
CompiledQuery with BitSet kind IDs and cached regex predicates. MultiCompiledQuery. Single-pass extract_multi_matches.
walker
src/walker.rs
Directory traversal via the ignore crate. Respects .gitignore. build_walker for single language, build_auto_walker for all extensions.
output
src/output.rs
ANSI color output, ColorMode, resolve_color_mode (respects NO_COLOR). Writer injection for testability.
trigram
src/trigram.rs
Byte-level trigram extraction. extract_unique_trigrams, extract_query_trigrams. No heap allocation per trigram.
bloom
src/bloom.rs
4096-bit Bloom filter with two FNV-1a hash functions. Fixed-size [u8; 512] bit array. insert, probably_contains, serialization.
index
src/index.rs
IndexEntry, IndexManifest. Atomic save via rename. bincode serialization. Version mismatch detection.
indexer
src/indexer.rs
Parallel index build/update pipeline. Staleness detection via mtime + size. Incremental re-indexing of modified files only.
sieve
src/sieve.rs
Bloom filter pre-rejection sieve. QueryTrigramSet, FileIndexStatus, should_parse_file. Multi-query OR semantics.

19Architecture — concurrency model

The search pipeline is fully parallel using Rayon's work-stealing thread pool. Every design decision was made to maximize throughput while eliminating contention.

Work-stealing thread pool

The directory walker's Iterator is converted to a Rayon parallel iterator via par_bridge(). Rayon distributes file processing across all available cores using a work-stealing scheduler — when a thread exhausts its local queue, it steals work from the end of another thread's queue.

Thread-local parser pool

Tree-sitter Parser initialization is expensive — it allocates C-level state via FFI. Creating and dropping a parser per file would be catastrophic. Instead, each Rayon worker thread owns exactly one Parser via thread_local!:

thread_local! {
    static PARSER: RefCell<Parser> = RefCell::new(Parser::new());
}

The parser is reconfigured for each file via set_language() before parsing — a cheap operation that resets internal state without a new allocation.

Shared state

The following objects are shared across all threads:

Lock design

Mutex locks are held only for the append/increment operation — never during parse or query work. The expensive CPU work (parsing, DFS traversal) happens entirely outside any lock.


20Architecture — memory model

doora maintains a flat RAM profile regardless of repository size by enforcing an ephemeral tree lifecycle.

Ephemeral lifecycle

When parsing a file with text search, only a small buffer needs to be in memory. With AST-based search, parsing a file produces a large C-allocated structure with potentially tens of thousands of nodes. If all trees were retained simultaneously, a 10,000-file repository would require tens of gigabytes of RAM.

The solution is the ephemeral lifecycle. In the parallel closure, the drop order is explicit and mandatory:

let (tree, source) = parse_file(entry.path(), &lang)?;
let matches = extract_multi_matches(&tree, source.as_bytes(), &multi, path);
drop(tree);    // C-allocated CST freed immediately
drop(source);  // source bytes freed immediately
// Only the small Vec<MatchResult> persists

Memory-mapped files

For files above 1MB, parse_file uses memmap2 instead of fs::read_to_string. The OS pages in only the bytes actually accessed, backed by the file's page cache rather than a heap allocation:

// Small files: heap allocation
FileSource::Heap(String)

// Large files (≥ 1MB): memory-mapped
FileSource::Mapped(memmap2::Mmap)

Maximum RAM usage

RAM usage is bounded by:


21Performance

doora is engineered for sub-second query latency on repositories with millions of lines of code.

Benchmark results

BenchmarkResultNotes
Single file parse + query (100 functions) ~180µs On Apple M2, release build
10,000-file Rust repository <1000ms Target; actual ~380ms on modern 8-core
Parallel search, 100 files, 20 functions each ~45ms Rayon, 8 cores
Query compilation (single, Rust grammar) ~12µs Includes BitSet and regex pre-compilation
Bloom filter rejection check <0.003ms Per file, two FNV-1a hash evaluations

Optimization stack

The following optimizations are layered from outermost to innermost:

Profiling

To profile on your hardware:

# Install flamegraph
cargo install flamegraph

# Run profiler against the Rust compiler source
make profile

# Or manually
cargo flamegraph --bin doora -- \
  -q '(function_item name: (identifier) @fn_name)' \
  -p /path/to/large/repo --lang rust --no-color --quiet

See PROFILING.md for detailed instructions and interpretation guidance.


22Building from source

Requires Rust 1.78 or later (MSRV). All grammar crates are compiled from source as part of the Cargo build.

# Debug build
cargo build

# Release build (recommended for performance testing)
cargo build --release

# Check only (fast compilation check)
cargo check --all-features

Dependencies

CrateVersionPurpose
tree-sitter0.22Incremental parsing library core
tree-sitter-rust/python/javascript/typescript/go/c/cpp0.21Language grammars
rayon1.10Work-stealing parallel iterator
ignore0.4Gitignore-aware directory walker
clap4CLI argument parsing
clap_complete4Shell completion generation
thiserror1Error type derivation
regex1Pre-compiled regex for #match? predicates
bincode2Binary serialization for the index
serde1Serialization derive macros
memmap20.9Memory-mapped file reading

23Testing

The test suite covers unit tests, integration tests, and statistical Bloom filter correctness tests.

# Run all tests
cargo test

# Run tests for a specific module
cargo test --lib parser
cargo test --lib query
cargo test --lib walker
cargo test --lib bloom

# Run integration tests
cargo test --test integration_test -- --nocapture

# Run Bloom filter statistical tests
cargo test --test bloom_statistics -- --nocapture

# Run binary tests (CLI validation, stats)
cargo test --bin doora

Test inventory

ModuleTest countCoverage focus
types.rs7MatchResult derives, AppError display, sort order
parser.rs20+Thread-local pool, mmap vs heap, all 7 languages
query.rs25+Query compilation, BitSet filtering, regex predicates, Arc sharing
walker.rs30+Extension filtering, gitignore, excluded dirs, binary detection
output.rs31Color output, format correctness, writer injection
bloom.rs15+Zero false negatives, bit operations, FNV-1a determinism
trigram.rs20+Sliding window, UTF-8 byte handling, deduplication
sieve.rs18+Multi-query OR semantics, staleness detection, end-to-end
integration_test.rs56+End-to-end pipeline across all 7 languages
bloom_statistics.rs8Zero false negatives across 1000 files, FPR measurement

Fixture files

Integration tests use fixture files in tests/fixtures/:


24Benchmarks

Criterion benchmarks in benches/search_benchmarks.rs. Run automatically in CI with regression detection.

# Run all benchmarks
cargo bench --bench search_benchmarks

# Run as tests (fast, no timing)
cargo bench --bench search_benchmarks -- --test

# Save baseline for comparison
cargo bench -- --save-baseline main

# Compare against baseline
cargo bench -- --baseline main

Benchmark suite

Benchmark groupWhat it measures
compile_single_query_rustQuery compilation time including BitSet + regex pre-compilation
compile_five_queries_rustMulti-query compilation overhead
single_file/functions/NParse + query throughput for files with 10, 100, 500, 1000 functions
heap_read vs mmap_readFile reading strategy comparison
one_query vs five_queriesSingle-pass multi-query overhead
parallel_100_files_20fn_eachFull parallel pipeline with result accumulation
compile_universal_query_all_languagesAuto mode query compilation across all 7 grammars

CI fails on regressions exceeding 10% for parallel benchmarks and 2% for single-file benchmarks. Results are uploaded as GitHub Actions artifacts with HTML reports from Criterion.


25Contributing

Contributions are welcome. The project is tracked issue-by-issue with milestones for each subsystem.

Before opening a PR

# Format
cargo fmt

# Lint — zero warnings enforced
cargo clippy --all-features -- -D warnings

# Full test suite
cargo test --all-features

All three must pass cleanly. The CI pipeline enforces #![deny(warnings)] and #![warn(clippy::pedantic)] across the entire codebase.

Code style

Adding a new language

  1. Add tree-sitter-<lang> to Cargo.toml
  2. Add a variant to Language enum in types.rs
  3. Add extension mapping in walker.rs::extensions_for_language
  4. Add grammar arm in parser.rs::get_language
  5. Add detection arm in parser.rs::detect_language
  6. Add variant in parser.rs::get_all_languages
  7. Add lang string to main.rs::resolve_lang and validate
  8. Add fixture file to tests/fixtures/
  9. Add integration tests following the pattern of existing language tests

License

MIT