Skip to content

Fix thread pool starvation under multi-concurrency#2387

Open
zachvan2 wants to merge 3 commits into
aws:devfrom
zachvan2:fix/mc-thread-pool-starvation
Open

Fix thread pool starvation under multi-concurrency#2387
zachvan2 wants to merge 3 commits into
aws:devfrom
zachvan2:fix/mc-thread-pool-starvation

Conversation

@zachvan2
Copy link
Copy Markdown

Summary

  • Set polling task count to MC level instead of Math.Max(2, processorCount) — ensures enough /next calls are pending to avoid Runtime.Unavailable race conditions
  • Pre-size the thread pool (SetMinThreads) to MC + ProcessorCount so polling loop continuations aren't starved by handler work

Problem

With the current defaults (2 polling tasks on 1 vCPU), the runtime doesn't call RAPID's /next fast enough under load. When handlers do blocking work, the .NET thread pool (which defaults MinThreads to ProcessorCount) can't resume the polling loop's await continuation, causing RAPID to timeout with Runtime.Unavailable.

Benchmark Results

Tested with RIE across configurations from MC=8/1vCPU to MC=128/4vCPU with both Thread.Sleep and CPU-bound workloads:

  • Without fix: Success rate drops as soon as load exceeds polling task count
  • With fix: 100% success up to MC, graceful degradation beyond
  • Improvement ranges from +26% to +93% success rate

Test plan

  • Existing LambdaBootstrapMultiConcurrencyTests pass
  • Updated UtilsTest.DetermineProcessingTaskCount cases reflect new defaults
  • New UtilsTest.GetMaxConcurrency tests cover parsing edge cases
  • RIE integration benchmarks at MC=8,16,32,64,128 with 1,2,4 vCPUs

Parse AWS_LAMBDA_MAX_CONCURRENCY as integer to use as the default polling
task count instead of Math.Max(2, processorCount). This ensures enough
polling tasks exist to fill all available concurrency slots.

Add AdjustThreadPoolSettings() to pre-size the ThreadPool to
mcCount + processorCount at startup, preventing blocking handlers from
starving polling task continuations of threads needed to cycle back to
RAPID's /next endpoint.

Without both changes, blocking handlers (Thread.Sleep, .Result, .Wait())
exhaust the ThreadPool under multi-concurrency, causing Runtime.Unavailable
errors because polling tasks cannot resume their await continuations.

Changes:
- Utils.cs: Add GetMaxConcurrency() method, use parsed MC value in
  DetermineProcessingTaskCount (fallback to Math.Max(2, processorCount)
  for non-numeric values)
- LambdaBootstrap.cs: Add AdjustThreadPoolSettings() called after
  AdjustMemorySettings() in RunAsync startup sequence
- TestMultiConcurrencyRuntimeApiClient.cs: Make thread-safe for
  concurrent polling task access (ConcurrentDictionary, lock on Queue)
- Add tests demonstrating fix works with MC=10/20 blocking handlers
@zachvan2 zachvan2 requested review from a team as code owners May 21, 2026 04:34
@zachvan2 zachvan2 requested review from normj and philasmar May 21, 2026 04:34
{
try
{
var maxConcurrency = Utils.GetMaxConcurrency(_environmentVariables);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we should check if Constants.ENVIRONMENT_VARIABLE_AWS_LAMBDA_DOTNET_PROCESSING_TASKS is set or create a separate environment variable to check. If that environment is set then use that value over AWS_LAMBDA_MAX_CONCURRENCY. That way if a customer finds in their work load that the number of threads we are creating hurts their performance they still have a knob they can use to adjust. For most flexibility in this advanced use case I prefer a separate environment, maybe AWS_LAMBDA_DOTNET_MIN_THREADS

@normj normj changed the base branch from master to dev May 22, 2026 19:00
@normj normj changed the base branch from dev to master May 22, 2026 19:48
@normj normj changed the base branch from master to dev May 22, 2026 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants