Skip to content

Make ExtractLicenses handle large expressions#139

Open
dangoor wants to merge 5 commits intomainfrom
138-extract-licenses-hang
Open

Make ExtractLicenses handle large expressions#139
dangoor wants to merge 5 commits intomainfrom
138-extract-licenses-hang

Conversation

@dangoor
Copy link
Copy Markdown
Contributor

@dangoor dangoor commented Apr 8, 2026

An expression with a lot of terms in it can cause ExtractLicences
memory usage to blow up. This change alters the traversal to
efficiently collect the licenses found.

Includes a test that ensures the long expression found in the wild
works. This test is a little ugly, because it runs the test in a
subprocess to ensure it can be reliably killed.

Fixes #138

An expression with a lot of terms in it can cause ExtractLicences
memory usage to blow up. This change alters the traversal to
efficiently collect the licenses found.

Includes a test that ensures the long expression found in the wild
works. This test is a little ugly, because it runs the test in a
subprocess to ensure it can be reliably killed.

Fixes #138
@dangoor dangoor requested a review from elrayle as a code owner April 8, 2026 22:08
Copilot AI review requested due to automatic review settings April 8, 2026 22:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses excessive memory usage / hangs in spdxexp.ExtractLicenses when evaluating large compound SPDX license expressions by replacing the expansion-based traversal with a direct AST walk that collects unique licenses.

Changes:

  • Reworked ExtractLicenses to traverse the parsed expression tree and collect unique reconstructed license strings without generating expanded combinations.
  • Added tests for LicenseRef handling/dedup and a “real world” long expression, executed in a subprocess with a timeout to detect hangs/OOM.
Show a summary per file
File Description
spdxexp/extracts.go Replaces expansion+flatten logic with a tree traversal that collects unique licenses in a map.
spdxexp/extracts_test.go Adds regression tests for LicenseRef dedup and a long-expression hang/OOM scenario via subprocess + timeout.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 1

Comment on lines +16 to +18
seen := map[string]struct{}{}
collectExtractedLicenses(node, seen)
return slices.Collect(maps.Keys(seen)), nil
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExtractLicenses now returns slices.Collect(maps.Keys(seen)), which depends on Go map iteration order and can produce nondeterministic output across runs. This is a behavior change from the previous implementation (which produced a deterministic order after expand(true) + dedup) and can lead to flaky downstream assertions or unstable output.

Consider returning a deterministic order (e.g., collect keys into a slice and sort before returning, or maintain insertion order while deduping).

Copilot uses AI. Check for mistakes.
@dangoor dangoor force-pushed the 138-extract-licenses-hang branch from 62e3e21 to e589960 Compare April 9, 2026 19:17
Copy link
Copy Markdown
Collaborator

@elrayle elrayle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice optimization!

dangoor added 4 commits April 9, 2026 20:01
This updates the Go version we're using and brings us up to speed
with the latest golangci-lint. This fixes the lint issues that
inevitably came up with the transition.
- ignore security error in test that is only run as part of tests and does not execute user input
- remove unused functions in helper file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ExtractLicenses hangs indefinitely on compound expressions with many AND/OR operators and kills processes

3 participants