Skip to content

Change batch aggregation to select only time periods that are being aggregated#2212

Merged
eiffel777 merged 16 commits into
ubccr:mainfrom
eiffel777:optimize-batch-aggreation-loading
May 12, 2026
Merged

Change batch aggregation to select only time periods that are being aggregated#2212
eiffel777 merged 16 commits into
ubccr:mainfrom
eiffel777:optimize-batch-aggreation-loading

Conversation

@eiffel777
Copy link
Copy Markdown
Contributor

Currently, when aggregating in batches, the data loaded into the temporary table and aggregated is between the min and max time periods for the batch and it may be the case that not all time periods in that range need to be aggregated. This PR changes to only loading the time periods that are being aggregated into the batch. It takes the WHERE clause in the aggregation and rewrites it to be multiple clauses, one for each time period be aggregated.

For example, if the original WHERE clause is:

task.start_day_id <= :period_end_day_id AND task.end_day_id >= :period_start_day_id AND task.is_deleted = 0

and we are aggregating 2 periods with period_start_day_id/period_end_day_id values of 20200101/20200131 and 20200201/20200229, the WHERE clause will be rewritten to:

(task.start_day_id <= :period_end_day_id_0 AND task.end_day_id >= :period_start_day_id_0 AND task.is_deleted = 0) 
OR 
(task.start_day_id <= :period_end_day_id_1 AND task.end_day_id >= :period_start_day_id_1 AND task.is_deleted = 0)

This speeds up the batch temp table creation and aggregation when there is a large gap in the time periods being aggregated in a batch.

Tests performed

Tested in docker and on xdmod-dev. New unit tests are added too.

Checklist:

  • The pull request description is suitable for a Changelog entry
  • The milestone is set correctly on the pull request
  • The appropriate labels have been added to the pull request

@eiffel777 eiffel777 added this to the 11.5.0 milestone May 5, 2026
@eiffel777 eiffel777 self-assigned this May 5, 2026
@eiffel777 eiffel777 added enhancement Enhancement of the functionality of an existing feature Category:ETL Extract Transform Load labels May 5, 2026
@eiffel777 eiffel777 merged commit c47f55a into ubccr:main May 12, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Category:ETL Extract Transform Load enhancement Enhancement of the functionality of an existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants