Conversation
An expression with a lot of terms in it can cause ExtractLicences memory usage to blow up. This change alters the traversal to efficiently collect the licenses found. Includes a test that ensures the long expression found in the wild works. This test is a little ugly, because it runs the test in a subprocess to ensure it can be reliably killed. Fixes #138
There was a problem hiding this comment.
Pull request overview
This PR addresses excessive memory usage / hangs in spdxexp.ExtractLicenses when evaluating large compound SPDX license expressions by replacing the expansion-based traversal with a direct AST walk that collects unique licenses.
Changes:
- Reworked
ExtractLicensesto traverse the parsed expression tree and collect unique reconstructed license strings without generating expanded combinations. - Added tests for LicenseRef handling/dedup and a “real world” long expression, executed in a subprocess with a timeout to detect hangs/OOM.
Show a summary per file
| File | Description |
|---|---|
spdxexp/extracts.go |
Replaces expansion+flatten logic with a tree traversal that collects unique licenses in a map. |
spdxexp/extracts_test.go |
Adds regression tests for LicenseRef dedup and a long-expression hang/OOM scenario via subprocess + timeout. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 1
| seen := map[string]struct{}{} | ||
| collectExtractedLicenses(node, seen) | ||
| return slices.Collect(maps.Keys(seen)), nil |
There was a problem hiding this comment.
ExtractLicenses now returns slices.Collect(maps.Keys(seen)), which depends on Go map iteration order and can produce nondeterministic output across runs. This is a behavior change from the previous implementation (which produced a deterministic order after expand(true) + dedup) and can lead to flaky downstream assertions or unstable output.
Consider returning a deterministic order (e.g., collect keys into a slice and sort before returning, or maintain insertion order while deduping).
62e3e21 to
e589960
Compare
This updates the Go version we're using and brings us up to speed with the latest golangci-lint. This fixes the lint issues that inevitably came up with the transition.
An expression with a lot of terms in it can cause ExtractLicences
memory usage to blow up. This change alters the traversal to
efficiently collect the licenses found.
Includes a test that ensures the long expression found in the wild
works. This test is a little ugly, because it runs the test in a
subprocess to ensure it can be reliably killed.
Fixes #138