docs: Update README and AGENTS.md for new features

aureliuscanon · aureliuscanon · commit f91540cd4b7f · 2026-03-04T22:09:40.000-05:00
diff --git a/AGENTS.md b/AGENTS.md
@@ -1,4 +1,4 @@
-# Agents Guide – Logmaster
+# Agents Guide – privlog
 
 This document is for **coding agents** that work on this repo.
 It explains what the project does and where the important code lives.
@@ -7,52 +7,51 @@ It explains what the project does and where the important code lives.
 
 ## 1. Purpose of this project
 
-- **`logmaster`** is a Python CLI tool built with Typer.
-- Its purpose is to analyze log files, identify patterns, and provide formatted output.
-- It uses a `rules/` directory to define custom analysis rules for Semgrep.
+- **`privlog`** is a Python CLI tool built with Typer for finding and preventing sensitive data leaks.
+- It uses a hybrid approach, combining pattern-based Semgrep rules with a high-precision, language-aware AST-based scanner.
 
 ---
 
 ## 2. Key files and modules
 
 - `pyproject.toml`
-  - **Purpose:** Defines project metadata, dependencies (`typer`, `pyyaml`), and entry points.
-  - **Responsibilities:** Manages the package and its dependencies using setuptools. Includes configuration for package data to ensure rule files are included.
+  - **Purpose:** Defines project metadata, dependencies (`typer`, `pyyaml`, `semgrep`), and the `privlog` entry point.
+  - **Responsibilities:** Manages the package and its dependencies.
 
 - `README.md`
-  - **Purpose:** Provides a high-level overview of the project for human users.
+  - **Purpose:** Provides a high-level overview for human users.
 
 - `logmaster/`
-  - The main Python package directory.
-
-- `logmaster/__init__.py`
-  - **Purpose:** Makes the `logmaster` directory a Python package.
+  - The main Python package directory. (Note: The project is named `privlog`, but the package directory is still `logmaster`).
 
 - `logmaster/cli.py`
   - **Purpose:** The main entry point for the CLI application.
-  - **Responsibilities:** Defines the CLI commands and arguments using Typer. It orchestrates calls to the runner and formatter.
+  - **Responsibilities:** Defines commands and arguments using Typer. Implements the `--warnings`/`-w` flag and filters findings based on severity (`ERROR` vs. `WARNING`).
 
 - `logmaster/runner.py`
-  - **Purpose:** The main analysis engine, orchestrating checks from multiple sources.
-  - **Responsibilities:** Runs both the Semgrep-based pattern checks (`_run_semgrep`) and the high-precision `ast_checks`. It then merges the findings from both sources into a single, unified list for the formatter and CLI. It also contains the data classes for the results (`Finding`, `RunResult`).
+  - **Purpose:** The main analysis engine.
+  - **Responsibilities:** Runs both Semgrep and AST checks, converts all findings into a common `Finding` object, and determines the final exit code based *only* on the presence of `ERROR`-level findings.
 
 - `logmaster/formatter.py`
-  - **Purpose:** Handles the presentation of the analysis results.
-  - **Responsibilities:** Takes a list of `Finding` objects from the runner and prints them to the console in a compact, `Flake8`-like format (`path:line:col CODE message`).
+  - **Purpose:** Handles the presentation of results.
+  - **Responsibilities:** Prints findings in a `Flake8`-like format, with color-coding for severities.
 
 - `logmaster/ast_checks.py`
-  - **Purpose:** A high-precision Python linter using the built-in `ast` module.
-  - **Responsibilities:** Parses Python source code into an Abstract Syntax Tree to perform complex, language-aware checks that are difficult with pattern-matching alone. It specializes in detecting sensitive variables inside f-strings that are not wrapped in known sanitizing functions (e.g., `get_salted_identifier`).
-
-- `logmaster/rules/`
-  - **Purpose:** Stores custom analysis rules.
+  - **Purpose:** A high-precision Python linter using the `ast` module. It is the core of the tool's intelligence.
+  - **Responsibilities:**
+    1.  **Severity System**: Divides sensitive variable names into `HIGH_CONFIDENCE_SENSITIVE_NAMES` (`ERROR`) and `WARNING_SENSITIVE_NAMES` (`WARNING`).
+    2.  **Multi-Format Detection**: Understands and inspects arguments within f-strings, `.format()` calls, and `%`-style formatting.
+    3.  **`print()` Check**: Scans `print()` statements for sensitive variables, applying the same severity logic as logging calls.
+    4.  **Heuristic Analysis**: Flags risky but not definitively incorrect patterns as `WARNING`s.
+  - **Finding Codes**:
+    - `LM2101`: A direct sensitive identifier was found in a logging call. Severity can be `ERROR` or `WARNING`.
+    - `LM2201`: A logging call uses the `extra` parameter, which could hide sensitive data. Severity is `WARNING`.
+    - `LM2202`: `json.dumps()` is used in a logging call. Severity is `WARNING`.
+    - `LM2203`: `.to_dict()` is used in a logging call. Severity is `WARNING`.
+    - `LM2301`: A direct sensitive identifier was found in a `print()` call. Severity can be `ERROR` or `WARNING`.
+    - `LM2302`: `json.dumps()` is used in a `print()` call. Severity is `WARNING`.
+    - `LM2303`: `.to_dict()` is used in a `print()` call. Severity is `WARNING`.
 
 - `logmaster/rules/logmaster.yml`
-  - **Purpose:** The core Semgrep ruleset for LogMaster, based on production-proven patterns.
-  - **Responsibilities:** Defines specific, categorized patterns to detect common logging anti-patterns. The rules are grouped by ID prefixes:
-    - `LM11xx`: High-signal PII leaks (e.g., raw emails, user IDs, IP addresses).
-    - `LM12xx`: High-confidence secret leakage, focusing on raw authentication headers (`Authorization`, `Cookie`). Complex variable name checks are handled by the AST module.
-    - `LM13xx`: Raw payload and header dumping.
-    - `LM14xx`: Unsafe exception logging that may leak sensitive data.
-    - `LM15xx`: Unbounded logging of vendor API responses.
-  Each rule has a unique ID, severity, and a clear message.
+  - **Purpose:** The core Semgrep ruleset, which complements the AST checker by finding broader, less precise patterns.
+  - **Responsibilities:** Defines rules for detecting PII, secrets, and unsafe logging patterns like payload dumping.
diff --git a/README.md b/README.md
@@ -1,3 +1,46 @@
-# Logmaster
+# privlog
 
-A CLI for mastering logs.
+A privacy-aware linter for Python projects, designed to catch accidental leaks of sensitive data in logs and `print` statements before they reach production.
+
+`privlog` is built to be a developer's first line of defense, integrating directly into your local workflow and CI/CD pipelines to enforce logging hygiene.
+
+## Features
+
+- **High-Precision AST Analysis**: Goes beyond simple regex to parse Python code, understanding variable names inside f-strings, `.format()` calls, and more.
+- **Severity System**: Differentiates between definite leaks (`ERROR`) and suspicious patterns that require manual review (`WARNING`), preventing false positives from breaking your build.
+- **Built-in Heuristics**: Flags risky patterns like logging entire dictionaries (`extra=...`) or `json.dumps()` output.
+- **`print()` Statement Detection**: Catches sensitive data in leftover `print()` statements, a common source of leaks.
+- **CI/CD Friendly**: Exits with a non-zero code only on `ERROR` findings, allowing warnings to be reviewed without blocking development.
+- **Extensible**: Powered by a combination of custom AST checks and a Semgrep rule engine.
+
+## Usage
+
+First, install the tool in your project's virtual environment:
+```sh
+pip install -e .
+```
+
+To run the checks, use the `privlog` command.
+
+### Default (Errors Only)
+
+By default, `privlog` only reports high-confidence `ERROR`s. If any are found, it will exit with a non-zero code, failing your build.
+
+```sh
+privlog /path/to/your/project
+```
+
+If only warnings are found, the command will pass and provide a helpful message:
+```
+✅ privlog passed. No errors found.
+  (Warnings were found. Run with -w to show them)
+```
+
+### Show Warnings
+
+To see both `ERROR`s and `WARNING`s, use the `-w` or `--warnings` flag.
+
+```sh
+privlog -w /path/to/your/project
+```
+This will display all findings, color-coded by severity, but will still only fail the build if `ERROR`s are present.