Skip to content

Atomic Search Param operations#5433

Open
jestradaMS wants to merge 72 commits intomainfrom
users/jestrada/atomicsearchparameteroperations
Open

Atomic Search Param operations#5433
jestradaMS wants to merge 72 commits intomainfrom
users/jestrada/atomicsearchparameteroperations

Conversation

@jestradaMS
Copy link
Copy Markdown
Contributor

@jestradaMS jestradaMS commented Mar 10, 2026

Description

This PR makes SearchParameter CRUD operations fully atomic and more reliable, especially for SQL Server-backed FHIR servers. It removes partial commit risks, reduces lock contention in transaction bundles, and makes the background refresh service the only owner of cache updates to prevent cache inconsistency.

It also improves diagnostics by adding clearer lifecycle logging, adds a dedicated retry policy for reindex operations to better handle transient SQL and Cosmos DB failures, and expands test coverage to verify that status updates no longer mutate the in-memory cache or publish mediator notifications.

Related issues

Addresses AB#183136.

Testing

Describe how this change was tested.

FHIR Team Checklist

  • Update the title of the PR to be succinct and less than 65 characters
  • Add a milestone to the PR for the sprint that it is merged (i.e. add S47)
  • Tag the PR with the type of update: Bug, Build, Dependencies, Enhancement, New-Feature or Documentation
  • Tag the PR with Open source, Azure API for FHIR (CosmosDB or common code) or Azure Healthcare APIs (SQL or common code) to specify where this change is intended to be released.
  • Tag the PR with Schema Version backward compatible or Schema Version backward incompatible or Schema Version unchanged if this adds or updates Sql script which is/is not backward compatible with the code.
  • When changing or adding behavior, if your code modifies the system design or changes design assumptions, please create and include an ADR.
  • CI is green before merge Build Status
  • Review squash-merge requirements

Semver Change (docs)

Patch|Skip|Feature|Breaking (reason)

…ed procedure

- Added new schema version V104 to SchemaVersion.cs and updated SchemaVersionConstants.cs to reflect the new maximum version.
- Enhanced the MergeSearchParams stored procedure to include additional parameters for resource change capture and transaction handling.
- Updated SqlServerSearchParameterStatusDataStore to handle new parameters based on schema version.
- Modified SqlServerFhirDataStore to conditionally use MergeSearchParams based on pending search parameter updates.
- Updated project file to set LatestSchemaVersion to 104.
- Refactored integration tests to accommodate changes in search parameter handling and ensure proper synchronization.
…ency control and ensure proper transaction management
…n SqlServerFhirDataStore to differentiate between transaction and non-transaction operations
…enhancing transaction handling and cache management
- Added new schema version V108 to SchemaVersion.cs.
- Updated SchemaVersionConstants.cs to set Max schema version to 108.
- Modified SqlServerSearchParameterStatusDataStore.cs to check for schema version 108 for enabling resource change capture.
- Updated project file to reflect the latest schema version as 108.
@jestradaMS jestradaMS requested a review from a team as a code owner March 10, 2026 13:46
@github-actions github-actions bot added the SQL Scripts If SQL scripts are added to the PR label Mar 10, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@3094ef0). Learn more about missing BASE report.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #5433   +/-   ##
=======================================
  Coverage        ?   77.24%           
=======================================
  Files           ?      977           
  Lines           ?    35904           
  Branches        ?     5427           
=======================================
  Hits            ?    27733           
  Misses          ?     6813           
  Partials        ?     1358           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jestradaMS jestradaMS changed the title [DO NOT REVIEW YET] Atomic Search Param operations Atomic Search Param operations Mar 11, 2026
@jestradaMS jestradaMS added this to the FY26\Q3\2Wk\2Wk18 milestone Mar 11, 2026
@jestradaMS jestradaMS added the Bug-Reliability Reliability related bugs. label Mar 11, 2026
Comment on lines +200 to +203
catch (Exception ex)
{
_logger.LogWarning(ex, "Failed to log WaitForAllInstancesCacheConsistencyAsync event.");
}

Check notice

Code scanning / CodeQL

Generic catch clause Note

Generic catch clause.

Copilot Autofix

AI 16 days ago

In general, to fix overly broad generic catch clauses, restrict the caught types to those you actually expect and can handle meaningfully. For asynchronous operations, a common refinement is to avoid catching OperationCanceledException (and sometimes TaskCanceledException) so that cooperative cancellation still works as intended. For background or logging helpers, it is still acceptable to catch “regular” exceptions while letting fatal exceptions propagate.

For this specific case in TryLogConsistencyWaitEventAsync (lines 203–214 in src/Microsoft.Health.Fhir.Core/Features/Search/Parameters/SearchParameterOperations.cs), the best fix is:

  • Add a dedicated catch block for OperationCanceledException that immediately rethrows, ensuring that cancellation flows correctly.
  • Optionally, add a catch for TaskCanceledException if TryLogEvent can throw it separately.
  • Keep a generic catch (Exception ex) after those specific catches, to preserve the existing behavior of swallowing/logging non-cancellation errors.

This preserves all current functionality (logging failures still only produce a warning and do not affect callers) while no longer intercepting cancellation exceptions. All required types (OperationCanceledException, TaskCanceledException, Exception) are in System, which is already imported, so no new imports are needed. The only code to change is within the body of TryLogConsistencyWaitEventAsync.


Suggested changeset 1
src/Microsoft.Health.Fhir.Core/Features/Search/Parameters/SearchParameterOperations.cs

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/Microsoft.Health.Fhir.Core/Features/Search/Parameters/SearchParameterOperations.cs b/src/Microsoft.Health.Fhir.Core/Features/Search/Parameters/SearchParameterOperations.cs
--- a/src/Microsoft.Health.Fhir.Core/Features/Search/Parameters/SearchParameterOperations.cs
+++ b/src/Microsoft.Health.Fhir.Core/Features/Search/Parameters/SearchParameterOperations.cs
@@ -207,6 +207,11 @@
                 using IScoped<ISearchService> searchService = _searchServiceFactory();
                 await searchService.Value.TryLogEvent(nameof(WaitForAllInstancesCacheConsistencyAsync), status, text, startDate, cancellationToken);
             }
+            catch (OperationCanceledException)
+            {
+                // Preserve cooperative cancellation semantics.
+                throw;
+            }
             catch (Exception ex)
             {
                 _logger.LogWarning(ex, "Failed to log WaitForAllInstancesCacheConsistencyAsync event.");
EOF
@@ -207,6 +207,11 @@
using IScoped<ISearchService> searchService = _searchServiceFactory();
await searchService.Value.TryLogEvent(nameof(WaitForAllInstancesCacheConsistencyAsync), status, text, startDate, cancellationToken);
}
catch (OperationCanceledException)
{
// Preserve cooperative cancellation semantics.
throw;
}
catch (Exception ex)
{
_logger.LogWarning(ex, "Failed to log WaitForAllInstancesCacheConsistencyAsync event.");
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +292 to +299
if (syncEventDate.HasValue && !string.IsNullOrEmpty(eventText))
{
if (!latestSyncByHost.TryGetValue(hostName, out var existingSync)
|| syncEventDate.Value > existingSync.SyncEventDate)
{
latestSyncByHost[hostName] = (syncEventDate.Value, eventText);
}
}

Check notice

Code scanning / CodeQL

Nested 'if' statements can be combined Note

These 'if' statements can be combined.

Copilot Autofix

AI 13 days ago

In general, to fix this kind of issue you replace the pattern:

if (cond1)
{
    if (cond2)
    {
        // body
    }
}

with:

if (cond1 && cond2)
{
    // body
}

provided there are no else branches and no side effects in the conditions.

For this specific case in SqlServerSearchParameterStatusDataStore.cs, the outer if at line 292 checks syncEventDate.HasValue && !string.IsNullOrEmpty(eventText) and contains only an inner if at line 294 that checks whether the dictionary already has a value or the new date is more recent. We can merge these into a single if by logically AND-ing all three conditions:

if (syncEventDate.HasValue
    && !string.IsNullOrEmpty(eventText)
    && (!latestSyncByHost.TryGetValue(hostName, out var existingSync)
        || syncEventDate.Value > existingSync.SyncEventDate))
{
    latestSyncByHost[hostName] = (syncEventDate.Value, eventText);
}

This preserves short-circuit behavior and ensures that syncEventDate.Value is only accessed when syncEventDate.HasValue is true, just as before. No new imports or methods are required; the change is localized to the foreach loop body around lines 288–299.

Suggested changeset 1
src/Microsoft.Health.Fhir.SqlServer/Features/Storage/Registry/SqlServerSearchParameterStatusDataStore.cs

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/Microsoft.Health.Fhir.SqlServer/Features/Storage/Registry/SqlServerSearchParameterStatusDataStore.cs b/src/Microsoft.Health.Fhir.SqlServer/Features/Storage/Registry/SqlServerSearchParameterStatusDataStore.cs
--- a/src/Microsoft.Health.Fhir.SqlServer/Features/Storage/Registry/SqlServerSearchParameterStatusDataStore.cs
+++ b/src/Microsoft.Health.Fhir.SqlServer/Features/Storage/Registry/SqlServerSearchParameterStatusDataStore.cs
@@ -289,13 +289,12 @@
             {
                 activeHosts.Add(hostName);
 
-                if (syncEventDate.HasValue && !string.IsNullOrEmpty(eventText))
+                if (syncEventDate.HasValue
+                    && !string.IsNullOrEmpty(eventText)
+                    && (!latestSyncByHost.TryGetValue(hostName, out var existingSync)
+                        || syncEventDate.Value > existingSync.SyncEventDate))
                 {
-                    if (!latestSyncByHost.TryGetValue(hostName, out var existingSync)
-                        || syncEventDate.Value > existingSync.SyncEventDate)
-                    {
-                        latestSyncByHost[hostName] = (syncEventDate.Value, eventText);
-                    }
+                    latestSyncByHost[hostName] = (syncEventDate.Value, eventText);
                 }
             }
 
EOF
@@ -289,13 +289,12 @@
{
activeHosts.Add(hostName);

if (syncEventDate.HasValue && !string.IsNullOrEmpty(eventText))
if (syncEventDate.HasValue
&& !string.IsNullOrEmpty(eventText)
&& (!latestSyncByHost.TryGetValue(hostName, out var existingSync)
|| syncEventDate.Value > existingSync.SyncEventDate))
{
if (!latestSyncByHost.TryGetValue(hostName, out var existingSync)
|| syncEventDate.Value > existingSync.SyncEventDate)
{
latestSyncByHost[hostName] = (syncEventDate.Value, eventText);
}
latestSyncByHost[hostName] = (syncEventDate.Value, eventText);
}
}

Copilot is powered by AI and may make mistakes. Always verify output.
@jestradaMS
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

// stale search parameter caches and would write resources with wrong hashes.
// Use the same lookback as active host detection so we do not miss qualifying
// refresh events that occurred shortly before this instance entered the wait.
var activeHostsSince = DateTime.UtcNow.AddSeconds(-20 * _searchParameterCacheRefreshIntervalSeconds);

Check failure

Code scanning / CodeQL

Possible loss of precision Error

Possible overflow: result of integer multiplication cast to double.

Copilot Autofix

AI 15 days ago

In general, to avoid integer overflow when using arithmetic results as floating-point values, ensure at least one operand in the arithmetic expression is of a floating-point type (e.g., double). That forces the operation to be done in floating-point, preventing 32-bit integer overflow and matching the semantics of APIs like DateTime.AddSeconds(double).

For this specific issue, we should change the computation of activeHostsSince so that the multiplication is done in double instead of int. The only place we need to modify is the expression passed to AddSeconds on line 213: -20 * _searchParameterCacheRefreshIntervalSeconds. We can do this by making 20 a double literal (20.0) or casting one operand to double (e.g., (double)_searchParameterCacheRefreshIntervalSeconds). This preserves intent (20 * intervalSeconds seconds in the past), avoids integer overflow, and does not change behavior for sane configuration values.

Concretely, in src/Microsoft.Health.Fhir.Core/Features/Operations/Reindex/ReindexOrchestratorJob.cs, in the method containing lines 213–215, we should replace:

var activeHostsSince = DateTime.UtcNow.AddSeconds(-20 * _searchParameterCacheRefreshIntervalSeconds);

with:

var activeHostsSince = DateTime.UtcNow.AddSeconds(-20.0 * _searchParameterCacheRefreshIntervalSeconds);

No new methods, imports, or definitions are needed; this is a pure expression-level change.

Suggested changeset 1
src/Microsoft.Health.Fhir.Core/Features/Operations/Reindex/ReindexOrchestratorJob.cs

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/Microsoft.Health.Fhir.Core/Features/Operations/Reindex/ReindexOrchestratorJob.cs b/src/Microsoft.Health.Fhir.Core/Features/Operations/Reindex/ReindexOrchestratorJob.cs
--- a/src/Microsoft.Health.Fhir.Core/Features/Operations/Reindex/ReindexOrchestratorJob.cs
+++ b/src/Microsoft.Health.Fhir.Core/Features/Operations/Reindex/ReindexOrchestratorJob.cs
@@ -210,7 +210,7 @@
                 // stale search parameter caches and would write resources with wrong hashes.
                 // Use the same lookback as active host detection so we do not miss qualifying
                 // refresh events that occurred shortly before this instance entered the wait.
-                var activeHostsSince = DateTime.UtcNow.AddSeconds(-20 * _searchParameterCacheRefreshIntervalSeconds);
+                var activeHostsSince = DateTime.UtcNow.AddSeconds(-20.0 * _searchParameterCacheRefreshIntervalSeconds);
                 var syncStartDate = activeHostsSince;
                 await _searchParameterOperations.WaitForAllInstancesCacheConsistencyAsync(syncStartDate, activeHostsSince, _cancellationToken);
             }
EOF
@@ -210,7 +210,7 @@
// stale search parameter caches and would write resources with wrong hashes.
// Use the same lookback as active host detection so we do not miss qualifying
// refresh events that occurred shortly before this instance entered the wait.
var activeHostsSince = DateTime.UtcNow.AddSeconds(-20 * _searchParameterCacheRefreshIntervalSeconds);
var activeHostsSince = DateTime.UtcNow.AddSeconds(-20.0 * _searchParameterCacheRefreshIntervalSeconds);
var syncStartDate = activeHostsSince;
await _searchParameterOperations.WaitForAllInstancesCacheConsistencyAsync(syncStartDate, activeHostsSince, _cancellationToken);
}
Copilot is powered by AI and may make mistakes. Always verify output.
@jestradaMS
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@jestradaMS
Copy link
Copy Markdown
Contributor Author

/azp run

1 similar comment
@jestradaMS
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@jestradaMS
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@jestradaMS
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

cmd.Parameters.AddWithValue("@IsResourceChangeCaptureEnabled", false);
cmd.Parameters.Add(new SqlParameter("@TransactionId", SqlDbType.BigInt) { Value = DBNull.Value });

new ResourceListTableValuedParameterDefinition("@Resources").AddParameter(cmd.Parameters, Array.Empty<ResourceListRow>());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All entries with empty lists are not needed. Removed in #5485

}
}

public async Task<CacheConsistencyResult> CheckCacheConsistencyAsync(string targetSearchParamLastUpdated, DateTime syncStartDate, DateTime activeHostsSince, CancellationToken cancellationToken)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewritten in #5485

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ADR-Included ADR Included in the PR Azure API for FHIR Label denotes that the issue or PR is relevant to the Azure API for FHIR Azure Healthcare APIs Label denotes that the issue or PR is relevant to the FHIR service in the Azure Healthcare APIs Bug-Reliability Reliability related bugs. No-PaaS-breaking-change Schema Version backward compatible SQL Scripts If SQL scripts are added to the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants