Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
265 changes: 120 additions & 145 deletions docs/semcode-mcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,182 +3,157 @@
All semcode functions are git aware and default to lookups on the current
commit. You can also pass a specific commit you're interested in, or a branch name.

**Note on Regex Patterns**: All regex patterns in semcode are **case-insensitive by default**. This applies to all pattern matching including function names, commit messages, symbols, and lore email searches. You don't need to use the `(?i)` flag.
**Regex**: all patterns are case-insensitive; no `(?i)` needed. Applies to
function names, commit messages, symbols, and lore email searches.

**Branch Support**: Most query tools support a `branch` parameter as an alternative to `git_sha`. When you specify a branch name (e.g., "main", "develop"), it will be resolved to the current tip commit of that branch. Branch takes precedence over git_sha if both are provided.
## Common parameters

- **git_sha**: commit to search (default: current)
- **branch**: branch name, resolved to its tip (e.g., "main"); takes
precedence over git_sha if both are given
- **page**: pagination (1-based); pages are 50 lines of the tool's
rendered text output, not 50 result records. Omit for full results.
- **since_date / until_date**: e.g., "yesterday", "2 weeks ago",
"2024-01-15"
- **\*_patterns**: arrays of regex. `author_patterns`, `subject_patterns`,
`from_patterns`, `body_patterns`, `recipients_patterns`,
`symbols_patterns`, `path_patterns` are OR'd within an array.
`regex_patterns` and `symbol_patterns` are AND'd within an array.

**Conventions**: boolean parameters default to `false`; `limit: 0`
means unlimited, except where the tool declares an explicit max --
in that case the max wins and `limit: 0` is rejected.

## Code lookup

In the call-graph tools below (`find_callers`, `find_calls`,
`find_callchain`), both sides of a call edge include functions and
function-like macros.

**find_function**: search for functions and macros
- git_sha: indicates which commit to search (default: current)
- branch: branch name to search (alternative to git_sha, e.g., "main", "develop")
- name: function/macro name, or a regex
- also displays details on callers and callees
**find_type**: search for types and typedefs
- git_sha: indicates which commit to search (default: current)
- branch: branch name to search (alternative to git_sha, e.g., "main", "develop")
- name: type/typedef name or regex
**find_callers**: find all functions that call a function or macro
- git_sha: indicates which commit to search (default: current)
- branch: branch name to search (alternative to git_sha, e.g., "main", "develop")
- name: function to search
**find_calls**: find all functions called by a function or macro
- git_sha: indicates which commit to search (default: current)
- branch: branch name to search (alternative to git_sha, e.g., "main", "develop")
- name: function to search
**find_callchain**: search complete function/macro call chain (forward and reverse)
- git_sha: indicates which commit to search (default: current)
- branch: branch name to search (alternative to git_sha, e.g., "main", "develop")
**find_callers**: find callers (functions or macros) of the named entity
- name: function or macro to search
**find_calls**: find callees (functions or macros) of the named entity
- name: function or macro to search
**find_callchain**: complete call chain (forward and reverse)
- name: function or macro to search
- up_levels: number of caller levels to show (default: 2, 0 = unlimited)
- down_levels: number of callee levels to show (default: 3, 0 = unlimited)
- calls_limit: max calls to show per level (default: 15, 0 = unlimited)
**diff_functions**: extract functions and types from a unified diff
- diff_content: the string to analyze
- Use this to determine which symbols are involved in a given diff
- diff_content: unified diff text (e.g., output of `git diff`)
- use this to determine which symbols are involved in a given diff

## Code search

**grep_functions**: search function/macro bodies for a regex
- git_sha: indicates which commit to search (default: current)
- branch: branch name to search (alternative to git_sha, e.g., "main", "develop")
- pattern: the regex to search for
- verbose: boolean, if true show full function bodies (default: false)
- verbose: if true, show full function bodies
- path_pattern: optional regex to filter results by path
- limit: max number of results to return (default: 100, 0 = unlimited)
- this only searches inside functions or macros, there's no need to escape
your pattern to limit the search.
- limit: max number of results (default: 100)
- the search is already scoped to function and macro bodies; no
need to anchor the pattern to constrain the search (regex
metacharacters are NOT auto-escaped)
**vgrep_functions**: vector embedding search on functions/macros/types
- git_sha: indicates which commit to search (default: current)
- branch: branch name to search (alternative to git_sha, e.g., "main", "develop")
- query_text: text describing the kind of functions to find (e.g., "memory allocation", "string comparison")
- query_text: text describing the kind of functions to find
- path_pattern: optional regex to filter results by path
- limit: max number of results to return (default: 10, max: 100)
- Embedding searches are only useful when you want to search for broad
concepts that a regex won't find well.
- The database might not have embeddings indexed
- limit: max number of results (default: 10, max: 100)
- only useful for broad concepts that a regex won't find well
- the database might not have embeddings indexed

## Commit search

Note: commit tools use **`git_ref`** (not `git_sha` from the common
parameters) and **`symbol_patterns`** (singular; AND'd -- distinct
from lore's plural `symbols_patterns`, which is OR'd).

Commit selection in `find_commit`: `git_ref` and `git_range` are
mutually exclusive. `reachable_sha` is a filter that may accompany
either, or stand alone (with no `git_ref` or `git_range`) to mean
"all indexed commits reachable from this sha".

**find_commit**: search for changes, potentially in a range of commits
- This can return a large body of results. Use pagination to manage context
- git_ref: single commit ref to lookup (sha, short sha, branch, HEAD etc)
- git_range: optional git range to search multiple commits: HEAD~10..HEAD etc
cannot be combined with git_ref
- author_patterns: optional array of regex to filter by author name/email (OR logic)
- subject_patterns: optional array of regex to filter by subject line (OR logic)
- regex_patterns: optional array of regex patterns to filter commits.
- All patterns are AND'd together
- Applied against the combination of commit message and unified diff
- symbol_patterns: optional array of regex of symbols to search for
- Use this to quickly find commits changing a function or type (w/regex)
- path_patterns: optional regex to filter commits based on which files they
change. Multiple regex can be passed and will be OR'd together
- page: optional page number for pagination (1-based). Each page contains
50 lines, results indicate current page and total pages. Default: full results
- reachable_sha: optional git sha, filter results to only those reachable from the
sha provided. Mutually exclusive with git_range
- verbose: show full diff in addition to metadata (default: false)
- can return a large body of results; use pagination to manage context
- git_ref: single commit ref (sha, short sha, branch, HEAD, etc.)
- git_range: optional range for multiple commits, e.g., HEAD~10..HEAD
- reachable_sha: optional git sha; filter to results reachable from it
- regex_patterns (AND'd): applied against commit message + unified diff
- symbol_patterns (AND'd): find commits changing a function or type
- author_patterns, subject_patterns, path_patterns (each OR'd)
- verbose: show full diff in addition to metadata
**vcommit_similar_commits**: search commits based on vector embeddings
- git_range: optional git range to search multiple commits: HEAD~10..HEAD etc
- query_text: search text
- author_patterns: optional array of regex to filter by author name/email (OR logic)
- subject_patterns: optional array of regex to filter by subject line (OR logic)
- regex_patterns: array of regex AND'd together to limit search results
- symbol_patterns: array of regex AND'd together to limit search results based
on symbols changed in the commit
- path_patterns: optional regex to filter commits based on which files they
change. Multiple regex can be passed and will be OR'd together
- limit: max results to return (default 10, max 50)
- reachable_sha: optional git sha, filter results to only those reachable from the
sha provided. Mutually exclusive with git_range
- page: optional page number for pagination (1-based). Each page contains
50 lines, results indicate current page and total pages. Default: full results
- git_range: optional range, e.g., HEAD~10..HEAD
- reachable_sha: optional git sha, reachable-from filter (combinable
with git_range)
- regex_patterns (AND'd), symbol_patterns (AND'd)
- author_patterns, subject_patterns, path_patterns (each OR'd)
- limit: max results (default 10, max 50)

## Lore (kernel mailing list archive)

Lore tools use **`symbols_patterns`** (plural; OR'd within the array --
distinct from commit tools' singular `symbol_patterns`, which is AND'd).
All `*_patterns` arrays below are OR'd within the array.

**lore_search**: search lore.kernel.org email archives
- from_patterns: optional array of regex to filter by sender (OR logic)
- subject_patterns: optional array of regex to filter by subject (OR logic)
- body_patterns: optional array of regex to filter by message body (OR logic)
- symbols_patterns: optional array of regex to filter by symbols in patches (OR logic)
- recipients_patterns: optional array of regex to filter by recipients (OR logic)
- message_id: optional exact message ID for direct lookup
- verbose: show full message body (default: false)
- show_thread: show full email thread for each match (default: false)
- show_replies: show replies/subthreads under each match (default: false, mutually exclusive with show_thread)
- limit: max number of results (default: 100, 0 = unlimited)
- since_date: filter emails from this date onwards (e.g., "yesterday", "2 weeks ago", "2024-01-15")
- until_date: filter emails up to this date
- mbox: output in MBOX format with full headers and body (default: false)
- page: optional page number for pagination (1-based). Each page contains
50 lines, results indicate current page and total pages. Default: full results
- verbose: show full message body
- show_thread: show full email thread for each match
- show_replies: show replies/subthreads under each match
(mutually exclusive with show_thread)
- mbox: output in MBOX format with full headers and body
- limit: max number of results (default: 100)
- accepts: from_patterns, subject_patterns, body_patterns,
symbols_patterns, recipients_patterns
**dig**: find lore.kernel.org emails related to a git commit
- commit: git commit reference (SHA, short SHA, HEAD, branch name, etc.)
- verbose: show full message body (default: false)
- show_all: show all duplicate results, not just most recent (default: false)
- show_thread: show full thread for each result (use with show_all, default: false)
- show_replies: show replies/subthreads under each result (use with show_all, mutually exclusive with show_thread)
- since_date: filter emails from this date onwards
- until_date: filter emails up to this date
- page: optional page number for pagination (1-based). Each page contains
50 lines, results indicate current page and total pages. Default: full results
- commit (required): git commit reference (SHA, short SHA, HEAD,
branch name, etc.)
- verbose: show full message body
- show_all: show all duplicate results, not just most recent
- show_thread: show full thread for each result (use with show_all)
- show_replies: show replies/subthreads (use with show_all, mutually
exclusive with show_thread)
**vlore_similar_emails**: semantic vector search over lore.kernel.org emails
- query_text: text describing the kind of emails to find (e.g., "memory leak fix", "performance optimization")
- from_patterns: optional array of regex to filter by sender (OR logic)
- subject_patterns: optional array of regex to filter by subject (OR logic)
- body_patterns: optional array of regex to filter by message body (OR logic)
- symbols_patterns: optional array of regex to filter by symbols in patches (OR logic)
- recipients_patterns: optional array of regex to filter by recipients (OR logic)
- limit: max number of results to return (default: 20, max: 100)
- since_date: filter emails from this date onwards
- until_date: filter emails up to this date
- page: optional page number for pagination (1-based). Each page contains
50 lines, results indicate current page and total pages. Default: full results
- The database might not have lore embeddings indexed
**list_branches**: list all indexed branches with their status
- No parameters required
- Shows branch names, indexed commit SHAs, and freshness status
- **up-to-date**: indexed commit matches current branch tip
- **outdated**: branch has new commits since indexing (re-index to update)
- Useful for tracking multiple stable branches (e.g., linux-5.10.y, 6.1.y, 6.12.y)
and knowing when they need re-indexing after new releases
**compare_branches**: compare two branches and show their relationship
- branch1: first branch name (e.g., "main")
- branch2: second branch name (e.g., "feature-branch")
- Shows merge base, ahead/behind status, and indexing status for both branches
**indexing_status**: check the status of background indexing operation
- No parameters required
- Shows current indexing progress, errors, and timing
- query_text: text describing the kind of emails to find
- limit: max number of results (default: 20, max: 100)
- accepts: from_patterns, subject_patterns, body_patterns,
symbols_patterns, recipients_patterns
- the database might not have lore embeddings indexed

## Branch / status

**list_branches**: list indexed branches with indexed SHA and
freshness (up-to-date vs. outdated against current tip). No
parameters.
**compare_branches**: compare two branches; shows merge base,
ahead/behind status, and indexing status for both
- branch1, branch2: branch names
**indexing_status**: show background indexing progress, errors,
and timing. No parameters.

## Lazy Loading

To reduce the initial context size consumed by the MCP server (saving ~96% of initial tokens), you can start the server in **lazy mode** using the `--lazy` flag.

In lazy mode, the server initially exposes only 3 meta-tools:

**list_categories**: List available tool categories
- No parameters required
- Returns a list of categories (e.g., `code_lookup`, `code_search`) and their descriptions
- Use this first to discover what semcode can do

**get_tools**: Get full schemas for tools in a category
- category: The name of the category to inspect (from `list_categories`)
- Returns the full tool definitions for all tools in that category
- Use this to learn how to call specific tools

**call_tool**: Execute a specific tool
- tool_name: Name of the tool to execute (e.g., `find_function`)
- arguments: Object containing the arguments for the tool
- Use this to run tools after you've discovered them

**Workflow**:
1. Call `list_categories` to see available functionality
2. Call `get_tools` for a relevant category (e.g., `code_lookup`)
3. Call `call_tool` to execute the desired tool (e.g., `find_function`)
Start the server with `--lazy` to cut initial context ~96%. The
server then exposes only three meta-tools (`list_categories`,
`get_tools`, `call_tool`); call them in that order to discover
and invoke full tools on demand.

## Recipes

### Searching for commits reachable from HEAD (or any other git sha)
### Locating a backported commit reachable from HEAD (or any other sha)

If a repository heavily cherry-picks patches, it might have a backported commit
under a different git sha. This means the most effective way to find the
backported commit is searching by commit subject:
Repositories that heavily cherry-pick patches store the backport
under a different git sha than the upstream commit. Search by
commit subject to find it, then narrow to commits reachable from
the branch tip with `reachable_sha`:

```
semcode> commit -r "bnxt_en: Fix memory corruption when FW resources change during ifdown"
semcode> commit -r "bnxt_en: Fix memory corruption when FW resources change during ifdown" --reachable HEAD
find_commit(regex_patterns=["bnxt_en: Fix memory corruption when FW resources change during ifdown"])
find_commit(regex_patterns=["bnxt_en: Fix memory corruption when FW resources change during ifdown"],
reachable_sha="HEAD")
```

❌ WRONG: reachable_sha=HEAD + git_range=HEAD~5000..HEAD
❌ WRONG: git_range=HEAD~5000..HEAD
✅ CORRECT: reachable_sha=HEAD only (no git_range)