Skip to content

Fix cascading BundleOrchestratorException log storm on parallel bundle failure#5490

Open
apurvabhaleMS wants to merge 3 commits intomainfrom
personal/abhle/reduce-bundleorchjob-excp
Open

Fix cascading BundleOrchestratorException log storm on parallel bundle failure#5490
apurvabhaleMS wants to merge 3 commits intomainfrom
personal/abhle/reduce-bundleorchjob-excp

Conversation

@apurvabhaleMS
Copy link
Copy Markdown
Contributor

@apurvabhaleMS apurvabhaleMS commented Apr 9, 2026

Description

In parallel bundle processing, if one worker fails (e.g., SQL timeout) the bundle operation transitions to Failed (or Canceled). The remaining parallel workers later call ReleaseResourceAsync().
Those calls could attempt further state transitions and end up throwing BundleOrchestratorException repeatedly—creating noisy cascading logs.

What the PR changes
Core fix: In BundleOrchestratorOperation.ReleaseResourceAsync(...), it now early-returns if the operation is already in a terminal state (Failed or Canceled).
This suppresses additional transitions/exceptions once the operation is already done.

Related issues

Addresses AB188320

Testing

Describe how this change was tested.

FHIR Team Checklist

  • Update the title of the PR to be succinct and less than 65 characters
  • Add a milestone to the PR for the sprint that it is merged (i.e. add S47)
  • Tag the PR with the type of update: Bug, Build, Dependencies, Enhancement, New-Feature or Documentation
  • Tag the PR with Open source, Azure API for FHIR (CosmosDB or common code) or Azure Healthcare APIs (SQL or common code) to specify where this change is intended to be released.
  • Tag the PR with Schema Version backward compatible or Schema Version backward incompatible or Schema Version unchanged if this adds or updates Sql script which is/is not backward compatible with the code.
  • When changing or adding behavior, if your code modifies the system design or changes design assumptions, please create and include an ADR.
  • CI is green before merge Build Status
  • Review squash-merge requirements

Semver Change (docs)

Patch|Skip|Feature|Breaking (reason)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a cascading exception/log storm in parallel bundle processing by making ReleaseResourceAsync a no-op once a bundle operation has already reached a terminal state (Failed/Canceled), preventing repeated invalid status transitions.

Changes:

  • Add an early-return guard in BundleOrchestratorOperation.ReleaseResourceAsync when status is Failed or Canceled.
  • Add unit tests to validate ReleaseResourceAsync does not throw after cancellation/failure.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/Microsoft.Health.Fhir.Core/Features/Persistence/Orchestration/BundleOrchestratorOperation.cs Prevents repeated status transitions/exceptions by returning early from ReleaseResourceAsync when the operation is already terminal.
src/Microsoft.Health.Fhir.Shared.Core.UnitTests/Features/Persistence/Orchestration/BundleOrchestratorOperationTests.cs Adds regression tests covering release-after-cancel and release-after-failure scenarios.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.04%. Comparing base (ea948f8) to head (7b88f35).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5490      +/-   ##
==========================================
+ Coverage   76.91%   77.04%   +0.13%     
==========================================
  Files         978      980       +2     
  Lines       35680    35702      +22     
  Branches     5356     5359       +3     
==========================================
+ Hits        27443    27507      +64     
+ Misses       6903     6862      -41     
+ Partials     1334     1333       -1     

see 14 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Bug bug bug. No-ADR ADR not needed No-PaaS-breaking-change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants