Change batch aggregation to select only time periods that are being aggregated#2212
Merged
eiffel777 merged 16 commits intoMay 12, 2026
Merged
Conversation
jpwhite4
approved these changes
May 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently, when aggregating in batches, the data loaded into the temporary table and aggregated is between the min and max time periods for the batch and it may be the case that not all time periods in that range need to be aggregated. This PR changes to only loading the time periods that are being aggregated into the batch. It takes the
WHEREclause in the aggregation and rewrites it to be multiple clauses, one for each time period be aggregated.For example, if the original WHERE clause is:
and we are aggregating 2 periods with period_start_day_id/period_end_day_id values of 20200101/20200131 and 20200201/20200229, the
WHEREclause will be rewritten to:This speeds up the batch temp table creation and aggregation when there is a large gap in the time periods being aggregated in a batch.
Tests performed
Tested in docker and on xdmod-dev. New unit tests are added too.
Checklist: