Skip to content

tbhb/vale-ai-tells

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vale-ai-tells

A Vale package for detecting linguistic patterns commonly associated with AI-generated prose. Based on 2024-2025 research into vocabulary fingerprints and structural tells.

This package targets technical documentation, where clarity and directness matter more than style. Less useful for creative writing, marketing copy, or other contexts where some of these patterns may represent intentional choices.

Note

The author created this package to help clean up AI-assisted technical documentation, not to disguise AI-generated content as human-written.

linted with vale-ai-tells

Installation

Add the package to your .vale.ini:

StylesPath = styles
MinAlertLevel = suggestion

Packages = https://github.com/tbhb/vale-ai-tells/releases/download/v1.6.3/ai-tells.zip, \
  https://github.com/tbhb/vale-ai-tells/releases/download/v1.6.3/ai-tells-commits.zip

[*.md]
BasedOnStyles = ai-tells

Then run:

vale sync

Linting commit messages

AI-generated commit messages carry the same fingerprints as AI-generated prose, plus a few tells of their own: self-referential preambles like "This commit adds\u2026," trailing justification clauses like "\u2026ensuring consistency," buzzword adjective combos like "comprehensive tests" and "robust error handling," and gitmoji patterns.

The ai-tells-commits style provides 6 rules purpose-built for commit messages, separate from the prose rules so you can opt in without pulling them into your docs.

Commit message rules

Rule Description
CommitSelfReference Self-narrating preambles: "This commit adds...," "This PR introduces...," "In this change...," "These changes ensure...," etc.
CommitTrailingJustification Trailing clauses that restate the obvious: "...ensuring consistency," "...improving readability," "...which allows for," "for better maintainability," etc.
CommitBuzzwords Vague adjective+noun combos: "comprehensive tests," "robust error handling," "proper validation," "various fixes," "relevant components," "necessary changes," etc.
CommitHedging Inappropriate uncertainty for changes already made: "This should fix...," "This may help...," "seems to resolve...," etc.
CommitEmoji Systematic gitmoji prefixes (✨🐛♻️📝⚡✅🔧🔥🚀 etc.) — emoji commit adoption has jumped from ~25% to ~75% of organizations, driven almost entirely by AI tools.
CommitOverexplanation Filler that pads without informing: "As part of this change...," "The purpose of this commit...," "Summary of changes," "The following changes were made," etc.

Setup

Add a [formats] section and a named section for the commit message file to your .vale.ini:

[formats]
COMMIT_EDITMSG = md

[{COMMIT_EDITMSG,.git/COMMIT_EDITMSG}]
BasedOnStyles = ai-tells, ai-tells-commits

The glob covers both how pre-commit passes the path and direct Vale invocations. Use both styles together: ai-tells catches general vocabulary and structural tells, ai-tells-commits catches commit-specific patterns.

Add the commit-msg hook to your .pre-commit-config.yaml:

  - repo: https://github.com/errata-ai/vale
    rev: 27593b0e0e7eb8f0c2b7fae0d93fa1cfaabceb2f # v3.13.0
    hooks:
      - id: vale
      - id: vale
        name: vale (commit message)
        stages: [commit-msg]
        args: [--ext=.md]

Install the hook:

prek install --hook-type commit-msg

Example

A blocked commit:

$ git commit -m "This commit leverages a comprehensive solution to seamlessly enhance the functionality"

vale (commit message)....................................................Failed
- hook id: vale
- exit code: 1

 .git/COMMIT_EDITMSG
 1:1   error  AI commit tell: 'This commit'. Commit messages shouldn't     ai-tells-commits.CommitSelfReference
              narrate themselves—just state what you did and why.
 1:13  error  AI vocabulary: 'leverages'. Replace with a more specific     ai-tells.OverusedVocabulary
              or common word.
 1:24  error  AI commit tell: 'comprehensive solution'. This vague         ai-tells-commits.CommitBuzzwords
              buzzword combo is a hallmark of AI-generated commits.
 1:48  error  AI vocabulary: 'seamlessly'. Replace with a more specific    ai-tells.OverusedVocabulary
              or common word.

Suppressing noisy rules

Some prose rules matter less for commit messages. If they generate noise, suppress them in your .vale.ini:

[{COMMIT_EDITMSG,.git/COMMIT_EDITMSG}]
BasedOnStyles = ai-tells, ai-tells-commits
ai-tells.SycophancyMarkers = NO
ai-tells.ClosingPleasantries = NO

Rules included

This package contains 44 rule files covering different categories of AI tells. All rules default to error level.

Rule Description
AbsoluteAssertions AI overconfidence: "the only way to," "the only real solution," "make no mistake," "there is no denying," "above all else," etc. Verify the claim or soften it.
AIAdjectiveNounPairs AI adjective immediately preceding a noun: "holistic approach," "seamless integration," "transformative impact," etc. Currently at warning level.
AICompoundPhrases Compound phrases: "rich tapestry," "intricate interplay," "paradigm shift," "double-edged sword," etc.
AnthropomorphicJustification Treating abstractions like employees: "earns its keep," "does the heavy lifting," "pulls its weight," "pays for itself," "speaks for itself," etc.
AffirmativeFormulas Revelation patterns: "Here's the thing," "And that's the beauty of it," "Let that sink in," etc.
ClosingPleasantries Sign-off language: "I hope this helps," "Feel free to ask," "Don't hesitate to reach out," etc.
ConclusionMarkers Formulaic conclusions: "In conclusion," "Ultimately," "At the end of the day," etc.
ContrastiveFormulas Rhetorical contrasts: "It's not just X; it's Y," "These aren't X. They're Y," "This doesn't mean X. It means Y," "The real question isn't X; it's Y," "Not only X but also Y," etc.
DefensiveHedges Preemptive concessions: "This may seem X, but..." "Admittedly, X, but..." "At first glance," etc.
DespiteChallenges The "despite challenges" dismissal formula: "despite these challenges," "while challenges remain," "challenges notwithstanding," etc.
EmDashUsage Em-dashes, which AI uses excessively
EmphaticCopula Italicized copula verbs and determiners for manufactured profundity
FalseBalance Evasive "both sides" language: "both sides present valid points," "nuanced approach," etc.
FalseExclusivity False insider drama: "nobody talks about," "what most people miss," "the dirty secret," "the elephant in the room," etc.
FillerPhrases Padding and performative sincerity: "a wide range of," "in order to," "honestly," etc.
FormalRegister Overly formal vocabulary: "utilize," "facilitate," "commence," etc.
FormalTransitions Formal transitions: "Moreover," "Furthermore," "What's more," "Case in point," etc.
HedgingPhrases Compulsive hedging: "It's important to note that," "That being said," "Generally speaking," "As you might expect," etc.
ListIntroductions Announcements of upcoming lists or summaries: "Below you'll find," "Here's a breakdown of," "Here's everything you need to know," "The following sections will," etc.
Metacommentary Throat-clearing and self-commentary that narrates the text rather than adding content
MicDrop Short dramatic sentences for manufactured emphasis in technical prose: "It matters." "Full stop." "And it shows." Contrastive fragments: "Dense, not cramped." Preference fragments: "Clarity over cleverness." Imperative mic-drops: "Trust the process." Categorical declarations: "Density is a feature."
MicDropHeadings Tagline-style headings: "Clarity, not cleverness," "Simple, then fast," "Speed over correctness," "X first, Y second," etc.
NarrativePivots Unearned dramatic pivots: "something shifted," "everything changed," "that changed everything," "it was a wake-up call," etc.
OpeningCliches AI-style openings: "In today's rapidly evolving landscape," "Without further ado," "Whether you're," etc.
OrganicConsequence False inevitability: "emerges naturally," "a natural consequence," "follows naturally from," etc.
OverusedVocabulary Words with documented AI overuse: "delve," "comprehensive," "unprecedented," "sophisticated," "salient," "efficacy," "paramount," "cognizant," "camaraderie," "palpable," "fleeting," "amidst," etc. Verb forms (leverage, harness, etc.) moved to OverusedVocabularyVerbs.
OverusedVocabularyVerbs Verb forms of AI vocabulary fingerprints: "leverage," "navigate," "showcase," "harness," "embark," "foster," "spearhead." Sequence-based for precision — noun forms such as "financial leverage" do not trigger.
ParallelStaccato Back-to-back minimal sentences with parallel structure: "Engineers build. Managers ship." "Content carries the personality. Chrome doesn't." Solo two-word staccato: "Complexity scales."
ParticipialPadding Present participle (-ing) phrases appended for shallow analysis: "highlighting its importance," "reflecting broader trends," "underscoring its role," "solidifying its position," etc. The #1 discriminating feature in the PNAS study (527% of human rate).
PromotionalPuffery Ad-copy and travel-brochure language: "nestled in," "vibrant community," "a beacon of," "renowned for its," "has emerged as a," "left an indelible mark," etc.
RestatementMarkers Redundant restatements: "In other words," "Simply put," "To be more specific," etc.
RhetoricalDevices Rhetorical question patterns: "Ask yourself:", "The test:", "When doing X, ask:" etc.
RhetoricalSelfAnswer Self-posed rhetorical questions answered for dramatic effect: "The result/catch/worst part?" followed by an immediate answer.
SelfReference Self-referential cross-references: "as mentioned above," "as noted earlier," "as we'll explore," etc.
SequencingMarkers Formulaic ordinal sequencing: "Firstly," "Secondly," "Thirdly," "The first takeaway," "The second benefit," etc.
ServesAsDodge Inflated copula replacements: "serves as a," "stands as the," "represents a pivotal," "boasts a vibrant," etc. Use "is" or "are" instead.
StackedAnaphora Stacked repetition for emphasis: "No X. No Y. No Z." "It's X. It's Y. It's Z." etc.
StructureAnnouncements Narrating upcoming structure: "key takeaway," "quick recap," "to recap," "quick summary," "to put it plainly," "to put this in perspective," etc.
SycophancyMarkers Flattering phrases: "Great question," "I'm happy to help," "You make an excellent point," etc.
UnpackExplore Explainer announcements: AI's habit of announcing what it is about to explain rather than just explaining it. Phrases beginning with "Let me" or "Let us" followed by unpack, break down, dive in, walk through, examine, explore, etc.
UrgencyInflation False urgency and importance assertions: "cannot be overstated," "more important than ever," "has never been more critical," "the stakes have never been higher," "at a critical juncture," "in an increasingly connected world," etc.
VagueAttributions Claims attributed to unnamed authorities: "experts argue," "studies show that," "research suggests," "a growing body of evidence," etc.
VerbTricolon Exactly-three parallel verb lists: "build, test, and deploy," "define, validate, and transform," etc.
VerbTricolonDensity Multiple verb tricolons in one paragraph — LLM prose clusters exactly-three enumerations.

What to write instead

Quick substitution reference for the most common patterns:

Instead of Write
delve into look at, cover, examine
leverage (verb) use, apply, build on
utilize use
seamlessly (delete)
comprehensive (delete, or name what's included)
in order to to
Moreover / Furthermore Also, And, or start a new sentence
em-dash comma, period, or parentheses
It's important to note that (delete — just state the point)
I hope this helps (delete)

Using with AI agents

Each error message gives AI agents, and humans alike, specific, usable guidance to fix issues immediately. Messages include:

  • A short prefix for quick identification: AI hedge:, AI filler:, and similar labels
  • The matched text
  • A concrete action: delete, rewrite, replace, or use a simpler word

Example workflow with an AI coding assistant:

You: Run `vale docs/` and fix any warnings or errors you find.

Agent: Running vale... Found 4 issues:

1. docs/intro.md:5 - AI opening: 'In today's rapidly evolving'.
   Start with your actual point instead of this generic lead-in.
2. docs/intro.md:12 - AI vocabulary: 'delve'.
   Replace with a more specific or common word.
3. docs/intro.md:12 - AI punctuation: em-dash detected.
   Use a comma, period, or parentheses instead.
4. docs/guide.md:8 - AI filler: 'in order to'.
   Delete this phrase—it adds no meaning.

Fixing these now...

[Agent edits the files, replacing generic phrases with specific content]

Running vale again... No issues found.

Customization

Disable specific rules:

[*.md]
BasedOnStyles = ai-tells
ai-tells.FormalTransitions = NO
ai-tells.EmDashUsage = NO

Change severity levels:

[*.md]
BasedOnStyles = ai-tells
ai-tells.HedgingPhrases = error

Early prevention with AI agent instructions

If you use an AI coding assistant, add instructions to your project's CLAUDE.md, AGENTS.md, or similar file to prevent Vale violations before they happen:

## Writing style

When writing or editing prose:

- Avoid AI vocabulary fingerprints: "delve," "tapestry," "multifaceted,"
  "leverage," "foster," "underscores," "comprehensive," "robust"
- Don't open with generic phrases like "In today's rapidly evolving..."
- Skip hedging ("It's important to note...") and filler ("in order to")
- Use commas or periods instead of em-dashes
- Cut sycophantic openers: "Great question!" "Absolutely!"
- Prefer simple words: "use" not "utilize," "help" not "facilitate"
- Start paragraphs with your actual point, not rhetorical wind-up

Limitations

This package catches lexical and phrasal patterns. It can't detect:

  • Sentence-length uniformity, or burstiness
  • Perplexity scores
  • Paragraph-length patterns
  • Semantic analysis
  • Model-specific stylometric signatures

Known patterns not covered

AI writing research documents these patterns, but they need analysis beyond Vale's token-matching capabilities:

  • Sentence-length uniformity: AI produces sentences of near-uniform length, roughly 27 words, while human writing varies widely. Requires statistical analysis across the document.
  • Paragraph-length uniformity: AI paragraphs tend toward uniform size, typically 3-5 sentences and 60-100 words each. Requires document-level measurement.
  • Dead metaphor repetition: AI latches onto a single metaphor and repeats it 5-10 times throughout a piece. Requires tracking metaphor usage across the document.
  • One-point dilution: A single argument restated 10 ways across thousands of words — circular repetition disguised as comprehensiveness. Requires semantic analysis.
  • Elegant variation: AI's repetition-penalty pushes it to substitute synonyms unnaturally, cycling through "protagonist," "key player," "eponymous character" instead of reusing a name. Requires NLP-level analysis.
  • Content duplication: Repeating entire sections or paragraphs verbatim within the same piece. Requires document-level diff analysis.
  • Unnecessary inline definitions: AI habitually inserts appositive definitions like "X, a [definition], does Y" even when the audience already knows the term. Too many false positives for token matching.
  • Invented concept labels: AI appends abstract problem-nouns like "paradox," "trap," "creep," and "divide" to domain words and treats them as established terms. Too many legitimate uses for token matching.

For fuller detection, combine this package with statistical analysis tools.

Supplementing with AI agent instructions

Vale can't detect structural patterns like sentence uniformity or paragraph rhythm. If you use an AI coding assistant, add instructions to your project's CLAUDE.md, AGENTS.md, or similar file to cover what Vale misses:

## Writing style

When writing or editing prose, vary your structure:

- Mix sentence lengths: follow long explanations with short punchy statements
- Vary paragraph lengths—not every paragraph needs 3-4 sentences
- Avoid the "topic sentence, three supporting points, conclusion" formula
- Don't start consecutive paragraphs or sentences with the same word
- Skip the "In conclusion" wrapper—just end when you're done
- Let some points stand alone without hedging or qualifications
- Be willing to be direct, even blunt, rather than diplomatically balanced

This covers structural patterns that lexical analysis can't catch.

Sources

Based on academic research, practitioner analysis, and community-maintained catalogs of AI writing patterns:

Academic research

Pattern catalogs

Practitioner analysis

Commit message research

AI disclosure

Claude wrote the majority of rule definitions, documentation, and test cases in this repository. ChatGPT and Gemini generated text samples for cross-model validation. A human designed the rule categories, severity assignments, quality criteria, and the research-to-rule pipeline. A human validated every AI-generated rule against test documents containing known patterns.

The CITATION.cff lists the human author. AI tools are not listed as authors, consistent with Committee for Publication Ethics (COPE) guidance on AI and authorship.

Citation

If you use this package in research or want to cite it, see CITATION.cff for the citation metadata.

License

MIT

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors