Optimize the MCP per-tool descriptions#39
Open
chucklever wants to merge 4 commits into
Open
Conversation
The MCP instructions are loaded into the LLM context at every session start. Repeating the git_sha/branch and pagination parameter blocks on every tool entry wastes roughly a third of the tokens without adding information that the preamble has not already established. Factor the repeated conventions (git_sha/branch, pagination, date filters, *_patterns array semantics) into a Common parameters section and reduce each tool entry to the parameters unique to that tool. Fold the reachable_sha caveat from the Recipes section into find_commit where it applies.
Apply additional token-efficiency reductions to the MCP tool guide beyond the initial compaction pass. The lazy-loading section previously duplicated information that the server's meta-tool schemas expose at runtime; collapse it to a single pointer. Factor repeated '(default: false)' markers on boolean parameters into one conventions line rather than restating the default on every tool. Hoist the reachable_sha / git_range mutual-exclusion rule to the Commit search section header so it is not duplicated across two tool entries. Trim examples and filler phrasings that the parameter names and prior sentences already convey. Net reduction is roughly 15 to 20 percent of the file, with no load-bearing details removed.
Four ambiguities in the tool guide could steer agents into wrong or rejected tool calls. First, the singular 'symbol_patterns' used by commit tools and the plural 'symbols_patterns' used by lore tools have opposite grouping semantics (AND vs OR) but near-identical names. The common-parameters summary stated the rule once, but per-tool bullets did not repeat which pattern arrays are AND'd or OR'd, so an agent scanning a single tool entry could reasonably assume uniform behavior. Mark the grouping inline on each pattern array in the per-tool descriptions. Second, find_commit takes 'git_ref' rather than the common 'git_sha'. The deviation was undocumented; an agent following the common parameters section would pass 'git_sha' and receive a validation error. Note the rename explicitly. Third, the relationship between git_ref, git_range, and reachable_sha in find_commit was underspecified. git_range and git_ref are mutually exclusive as commit selectors; reachable_sha is a filter that may accompany either, or stand alone to mean 'all indexed commits reachable from this sha'. State the three-way relationship explicitly. Fourth, dig's 'commit' parameter is required in the JSON schema but the doc did not mark it as such. Flag it so agents supply it.
A second review pass turned up several smaller doc issues that do not
cause validation errors on their own, but do send agents on needless
round-trips. The page-size convention ('50 lines per page') did not
say what 'lines' measures -- it is output lines of rendered text, not
result records. The grep_functions hint 'no need to escape your
pattern' reads ambiguously, as if regex metacharacters were escaped
automatically; the intent is that the search is already scoped to
function and macro bodies, so anchors are unnecessary. The
'limit: 0 = unlimited' convention collides silently with tools that
declare an explicit max (vgrep_functions, vcommit_similar_commits,
vlore_similar_emails); note that an explicit max wins. The
find_callers, find_calls, and find_callchain descriptions referred
variously to 'functions' and 'functions or macros' on the two sides
of an edge; make clear that both sides of a call edge include
function-like macros. The diff_functions parameter description
'the string to analyze' becomes 'unified diff text (e.g., output of
git diff)'. Rewrite the commit-reachable recipe as an MCP tool call
rather than the query tool's CLI syntax, since this file is loaded
as MCP server instructions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The semcode MCP instructions are loaded into the LLM context at every session start. Tighten up the instructions to improve token efficiency and remove contradictions and other minor issues. A semcode rebuild is needed after applying these changes.