You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This document is for **coding agents** that work on this repo.
4
4
It explains what the project does and where the important code lives.
@@ -7,52 +7,51 @@ It explains what the project does and where the important code lives.
7
7
8
8
## 1. Purpose of this project
9
9
10
-
-**`logmaster`** is a Python CLI tool built with Typer.
11
-
- Its purpose is to analyze log files, identify patterns, and provide formatted output.
12
-
- It uses a `rules/` directory to define custom analysis rules for Semgrep.
10
+
-**`privlog`** is a Python CLI tool built with Typer for finding and preventing sensitive data leaks.
11
+
- It uses a hybrid approach, combining pattern-based Semgrep rules with a high-precision, language-aware AST-based scanner.
13
12
14
13
---
15
14
16
15
## 2. Key files and modules
17
16
18
17
-`pyproject.toml`
19
-
-**Purpose:** Defines project metadata, dependencies (`typer`, `pyyaml`), and entry points.
20
-
-**Responsibilities:** Manages the package and its dependencies using setuptools. Includes configuration for package data to ensure rule files are included.
18
+
-**Purpose:** Defines project metadata, dependencies (`typer`, `pyyaml`, `semgrep`), and the `privlog`entry point.
19
+
-**Responsibilities:** Manages the package and its dependencies.
21
20
22
21
-`README.md`
23
-
-**Purpose:** Provides a high-level overview of the project for human users.
22
+
-**Purpose:** Provides a high-level overview for human users.
24
23
25
24
-`logmaster/`
26
-
- The main Python package directory.
27
-
28
-
-`logmaster/__init__.py`
29
-
-**Purpose:** Makes the `logmaster` directory a Python package.
25
+
- The main Python package directory. (Note: The project is named `privlog`, but the package directory is still `logmaster`).
30
26
31
27
-`logmaster/cli.py`
32
28
-**Purpose:** The main entry point for the CLI application.
33
-
-**Responsibilities:** Defines the CLI commands and arguments using Typer. It orchestrates calls to the runner and formatter.
29
+
-**Responsibilities:** Defines commands and arguments using Typer. Implements the `--warnings`/`-w` flag and filters findings based on severity (`ERROR` vs. `WARNING`).
34
30
35
31
-`logmaster/runner.py`
36
-
-**Purpose:** The main analysis engine, orchestrating checks from multiple sources.
37
-
-**Responsibilities:** Runs both the Semgrep-based pattern checks (`_run_semgrep`) and the high-precision `ast_checks`. It then merges the findings from both sources into a single, unified list for the formatter and CLI. It also contains the data classes for the results (`Finding`, `RunResult`).
32
+
-**Purpose:** The main analysis engine.
33
+
-**Responsibilities:** Runs both Semgrepand AST checks, converts all findings into a common `Finding` object, and determines the final exit code based *only* on the presence of `ERROR`-level findings.
38
34
39
35
-`logmaster/formatter.py`
40
-
-**Purpose:** Handles the presentation of the analysis results.
41
-
-**Responsibilities:**Takes a list of `Finding` objects from the runner and prints them to the console in a compact, `Flake8`-like format (`path:line:col CODE message`).
36
+
-**Purpose:** Handles the presentation of results.
37
+
-**Responsibilities:**Prints findings in a `Flake8`-like format, with color-coding for severities.
42
38
43
39
-`logmaster/ast_checks.py`
44
-
-**Purpose:** A high-precision Python linter using the built-in `ast` module.
45
-
-**Responsibilities:** Parses Python source code into an Abstract Syntax Tree to perform complex, language-aware checks that are difficult with pattern-matching alone. It specializes in detecting sensitive variables inside f-strings that are not wrapped in known sanitizing functions (e.g., `get_salted_identifier`).
46
-
47
-
-`logmaster/rules/`
48
-
-**Purpose:** Stores custom analysis rules.
40
+
-**Purpose:** A high-precision Python linter using the `ast` module. It is the core of the tool's intelligence.
41
+
-**Responsibilities:**
42
+
1.**Severity System**: Divides sensitive variable names into `HIGH_CONFIDENCE_SENSITIVE_NAMES` (`ERROR`) and `WARNING_SENSITIVE_NAMES` (`WARNING`).
43
+
2.**Multi-Format Detection**: Understands and inspects arguments within f-strings, `.format()` calls, and `%`-style formatting.
44
+
3.**`print()` Check**: Scans `print()` statements for sensitive variables, applying the same severity logic as logging calls.
45
+
4.**Heuristic Analysis**: Flags risky but not definitively incorrect patterns as `WARNING`s.
46
+
-**Finding Codes**:
47
+
-`LM2101`: A direct sensitive identifier was found in a logging call. Severity can be `ERROR` or `WARNING`.
48
+
-`LM2201`: A logging call uses the `extra` parameter, which could hide sensitive data. Severity is `WARNING`.
49
+
-`LM2202`: `json.dumps()` is used in a logging call. Severity is `WARNING`.
50
+
-`LM2203`: `.to_dict()` is used in a logging call. Severity is `WARNING`.
51
+
-`LM2301`: A direct sensitive identifier was found in a `print()` call. Severity can be `ERROR` or `WARNING`.
52
+
-`LM2302`: `json.dumps()` is used in a `print()` call. Severity is `WARNING`.
53
+
-`LM2303`: `.to_dict()` is used in a `print()` call. Severity is `WARNING`.
49
54
50
55
-`logmaster/rules/logmaster.yml`
51
-
-**Purpose:** The core Semgrep ruleset for LogMaster, based on production-proven patterns.
52
-
-**Responsibilities:** Defines specific, categorized patterns to detect common logging anti-patterns. The rules are grouped by ID prefixes:
53
-
-`LM11xx`: High-signal PII leaks (e.g., raw emails, user IDs, IP addresses).
54
-
-`LM12xx`: High-confidence secret leakage, focusing on raw authentication headers (`Authorization`, `Cookie`). Complex variable name checks are handled by the AST module.
55
-
-`LM13xx`: Raw payload and header dumping.
56
-
-`LM14xx`: Unsafe exception logging that may leak sensitive data.
57
-
-`LM15xx`: Unbounded logging of vendor API responses.
58
-
Each rule has a unique ID, severity, and a clear message.
56
+
-**Purpose:** The core Semgrep ruleset, which complements the AST checker by finding broader, less precise patterns.
57
+
-**Responsibilities:** Defines rules for detecting PII, secrets, and unsafe logging patterns like payload dumping.
A privacy-aware linter for Python projects, designed to catch accidental leaks of sensitive data in logs and `print` statements before they reach production.
4
+
5
+
`privlog` is built to be a developer's first line of defense, integrating directly into your local workflow and CI/CD pipelines to enforce logging hygiene.
6
+
7
+
## Features
8
+
9
+
-**High-Precision AST Analysis**: Goes beyond simple regex to parse Python code, understanding variable names inside f-strings, `.format()` calls, and more.
10
+
-**Severity System**: Differentiates between definite leaks (`ERROR`) and suspicious patterns that require manual review (`WARNING`), preventing false positives from breaking your build.
11
+
-**Built-in Heuristics**: Flags risky patterns like logging entire dictionaries (`extra=...`) or `json.dumps()` output.
12
+
-**`print()` Statement Detection**: Catches sensitive data in leftover `print()` statements, a common source of leaks.
13
+
-**CI/CD Friendly**: Exits with a non-zero code only on `ERROR` findings, allowing warnings to be reviewed without blocking development.
14
+
-**Extensible**: Powered by a combination of custom AST checks and a Semgrep rule engine.
15
+
16
+
## Usage
17
+
18
+
First, install the tool in your project's virtual environment:
19
+
```sh
20
+
pip install -e .
21
+
```
22
+
23
+
To run the checks, use the `privlog` command.
24
+
25
+
### Default (Errors Only)
26
+
27
+
By default, `privlog` only reports high-confidence `ERROR`s. If any are found, it will exit with a non-zero code, failing your build.
28
+
29
+
```sh
30
+
privlog /path/to/your/project
31
+
```
32
+
33
+
If only warnings are found, the command will pass and provide a helpful message:
34
+
```
35
+
✅ privlog passed. No errors found.
36
+
(Warnings were found. Run with -w to show them)
37
+
```
38
+
39
+
### Show Warnings
40
+
41
+
To see both `ERROR`s and `WARNING`s, use the `-w` or `--warnings` flag.
42
+
43
+
```sh
44
+
privlog -w /path/to/your/project
45
+
```
46
+
This will display all findings, color-coded by severity, but will still only fail the build if `ERROR`s are present.
0 commit comments