You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This document is for **coding agents** that work on this repo.
4
-
It explains what the project does and where the important code lives.
3
+
This guide is for developers who want to contribute to the `privlog` project. It explains the project's architecture and where key logic lives.
5
4
6
5
---
7
6
8
7
## 1. Purpose of this project
9
8
10
-
-**`privlog`** is a Python CLI tool built with Typer for finding and preventing sensitive data leaks.
11
-
- It uses a hybrid approach, combining pattern-based Semgrep rules with a high-precision, language-aware AST-based scanner.
9
+
-**`privlog`** is a privacy-aware linter for Python that uses a Typer CLI. Its analysis is powered by a hybrid engine combining pattern-based Semgrep rules with a high-precision, language-aware AST-based scanner.
12
10
13
11
---
14
12
15
13
## 2. Key files and modules
16
14
17
15
-`pyproject.toml`
18
-
-**Purpose:** Defines project metadata, dependencies (`typer`, `pyyaml`, `semgrep`), and the `privlog` entry point.
19
-
-**Responsibilities:** Manages the package and its dependencies.
16
+
-**Purpose:** Defines project metadata, dependencies, and the `privlog` entry point. It is also the location for user-defined configuration under the `[tool.privlog]` section.
20
17
21
18
-`README.md`
22
-
-**Purpose:** Provides a high-level overview for human users.
19
+
-**Purpose:** Provides a high-level overview and usage instructions for users.
23
20
24
21
-`privlog/`
25
22
- The main Python package directory.
26
23
27
24
-`privlog/cli.py`
28
25
-**Purpose:** The main entry point for the CLI application.
29
-
-**Responsibilities:** Defines commands and arguments using Typer. Implements the `--warnings`/`-w` flag and filters findings based on severity (`ERROR` vs. `WARNING`).
26
+
-**Responsibilities:** Defines commands and arguments using Typer. Implements the `--warnings`/`-w` flag and filters findings based on severity.
30
27
31
28
-`privlog/runner.py`
32
29
-**Purpose:** The main analysis engine.
33
-
-**Responsibilities:** Runs both Semgrep and AST checks, converts all findings into a common `Finding` object, and determines the final exit code based *only* on the presence of `ERROR`-level findings.
30
+
-**Responsibilities:**
31
+
1. Loads user configuration from `pyproject.toml` via the `_load_config` function.
32
+
2. Runs the Semgrep scanner.
33
+
3. Runs the AST checker, passing the loaded configuration to it.
34
+
4. Merges findings from both sources.
35
+
5. Determines the final exit code based *only* on the presence of `ERROR`-level findings.
34
36
35
37
-`privlog/formatter.py`
36
38
-**Purpose:** Handles the presentation of results.
@@ -41,17 +43,15 @@ It explains what the project does and where the important code lives.
41
43
-**Responsibilities:**
42
44
1.**Severity System**: Divides sensitive variable names into `HIGH_CONFIDENCE_SENSITIVE_NAMES` (`ERROR`) and `WARNING_SENSITIVE_NAMES` (`WARNING`).
43
45
2.**Multi-Format Detection**: Understands and inspects arguments within f-strings, `.format()` calls, and `%`-style formatting.
44
-
3.**`print()` Check**: Scans `print()` statements for sensitive variables, applying the same severity logic as logging calls.
45
-
4.**Heuristic Analysis**: Flags risky but not definitively incorrect patterns as `WARNING`s.
46
+
3.**`print()` Check**: Scans `print()` statements for sensitive variables.
47
+
4.**Heuristic Analysis**: Flags risky patterns like logging with `extra=...` or `json.dumps()`.
48
+
5.**Custom Wrapper Analysis**: Receives the `PrivlogConfig` object and inspects function calls to see if they match a name in the `custom_wrappers` configuration, checking their keyword arguments accordingly.
46
49
-**Finding Codes**:
47
-
-`LM2101`: A direct sensitive identifier was found in a logging call. Severity can be `ERROR` or `WARNING`.
48
-
-`LM2201`: A logging call uses the `extra` parameter, which could hide sensitive data. Severity is `WARNING`.
49
-
-`LM2202`: `json.dumps()` is used in a logging call. Severity is `WARNING`.
50
-
-`LM2203`: `.to_dict()` is used in a logging call. Severity is `WARNING`.
51
-
-`LM2301`: A direct sensitive identifier was found in a `print()` call. Severity can be `ERROR` or `WARNING`.
52
-
-`LM2302`: `json.dumps()` is used in a `print()` call. Severity is `WARNING`.
53
-
-`LM2303`: `.to_dict()` is used in a `print()` call. Severity is `WARNING`.
50
+
-`LM2101`: A direct sensitive identifier was found in a logging call.
51
+
-`LM2201-2203`: A heuristic pattern (like `extra=...` or `json.dumps`) was found in a logging call.
52
+
-`LM2301-2303`: A sensitive identifier or heuristic pattern was found in a `print()` call.
53
+
-`LM2401`: A sensitive argument was passed to a custom logging wrapper defined in the user's configuration.
54
54
55
55
-`privlog/rules/privlog.yml`
56
-
-**Purpose:** The core Semgrep ruleset, which complements the AST checker by finding broader, less precise patterns.
57
-
-**Responsibilities:** Defines rules for detecting PII, secrets, and unsafe logging patterns like payload dumping.
56
+
-**Purpose:** The core Semgrep ruleset, which complements the AST checker.
57
+
-**Responsibilities:** Defines rules for detecting PII, secrets, and unsafe logging patterns.
Copy file name to clipboardExpand all lines: README.md
+27-1Lines changed: 27 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ A privacy-aware linter for Python projects, designed to catch accidental leaks o
11
11
-**Built-in Heuristics**: Flags risky patterns like logging entire dictionaries (`extra=...`) or `json.dumps()` output.
12
12
-**`print()` Statement Detection**: Catches sensitive data in leftover `print()` statements, a common source of leaks.
13
13
-**CI/CD Friendly**: Exits with a non-zero code only on `ERROR` findings, allowing warnings to be reviewed without blocking development.
14
-
-**Extensible**: Powered by a combination of custom AST checks and a Semgrep rule engine.
14
+
-**Configurable & Extensible**: Teach `privlog` about your project's custom logging functions via a simple `pyproject.toml` configuration.
15
15
16
16
## Installation
17
17
@@ -42,7 +42,11 @@ Once installed, run the `privlog` command on your project directory.
42
42
By default, `privlog` only reports high-confidence `ERROR`s. If any are found, it will exit with a non-zero code, failing your build.
43
43
44
44
```sh
45
+
# Scan a specific directory
45
46
privlog /path/to/your/project
47
+
48
+
# Or, from inside a project, scan the current directory
49
+
privlog .
46
50
```
47
51
48
52
If only warnings are found, the command will pass and provide a helpful message:
@@ -56,10 +60,32 @@ If only warnings are found, the command will pass and provide a helpful message:
56
60
To see both `ERROR`s and `WARNING`s, use the `-w` or `--warnings` flag.
57
61
58
62
```sh
63
+
# Scan a specific directory with warnings
59
64
privlog -w /path/to/your/project
65
+
66
+
# Or, from inside a project, scan the current directory with warnings
67
+
privlog -w .
60
68
```
69
+
61
70
This will display all findings, color-coded by severity, but will still only fail the build if `ERROR`s are present.
62
71
72
+
### Configuring Custom Wrappers
73
+
74
+
You can teach `privlog` to recognize your own custom logging functions. In your project's `pyproject.toml` file, add a `[tool.privlog.custom_wrappers]` section.
75
+
76
+
For each custom function, specify its name and which of its keyword arguments should be treated as sensitive, along with the desired severity (`ERROR` or `WARNING`).
77
+
78
+
**Example `pyproject.toml`:**
79
+
```toml
80
+
[tool.privlog.custom_wrappers]
81
+
# For a function call like: audit(actor_id=user.id, event="login")
82
+
audit = { actor_id = "ERROR" }
83
+
84
+
# For a function call like: log_event("payment_failed", details=evt)
85
+
log_event = { details = "WARNING" }
86
+
```
87
+
`privlog` will automatically find and use this configuration when you run it.
0 commit comments