The existing leakage detection in PyRIT relies on SelfAskTrueFalseScorer with the leakage.yaml prompt, which requires an LLM call for every evaluation. This works but is slow and expensive at scale.
For CI pipelines and high-volume red team evaluations, a fast deterministic scorer that uses regex to detect common credential patterns (AWS keys, GitHub tokens, JWTs, private keys, connection strings, etc.) would be useful.
Proposed: a CredentialLeakScorer that extends TrueFalseScorer, uses compiled regex patterns, and returns True if any credential format is detected in the model output. No LLM call needed. Supports custom pattern dictionaries for organization-specific secret formats.
This complements the LLM-based scorer — use the regex scorer for speed in CI, fall back to the LLM scorer for nuanced detection of indirect leaks.
I have a working implementation ready and will open a PR.
The existing leakage detection in PyRIT relies on SelfAskTrueFalseScorer with the leakage.yaml prompt, which requires an LLM call for every evaluation. This works but is slow and expensive at scale.
For CI pipelines and high-volume red team evaluations, a fast deterministic scorer that uses regex to detect common credential patterns (AWS keys, GitHub tokens, JWTs, private keys, connection strings, etc.) would be useful.
Proposed: a CredentialLeakScorer that extends TrueFalseScorer, uses compiled regex patterns, and returns True if any credential format is detected in the model output. No LLM call needed. Supports custom pattern dictionaries for organization-specific secret formats.
This complements the LLM-based scorer — use the regex scorer for speed in CI, fall back to the LLM scorer for nuanced detection of indirect leaks.
I have a working implementation ready and will open a PR.