Skip to content

fix(infra): AILZ deployment fixes - private endpoints, DNS, authentication , and ingress#46

Open
sihbher wants to merge 31 commits intoAzure:mainfrom
sihbher:local-test-002
Open

fix(infra): AILZ deployment fixes - private endpoints, DNS, authentication , and ingress#46
sihbher wants to merge 31 commits intoAzure:mainfrom
sihbher:local-test-002

Conversation

@sihbher
Copy link
Copy Markdown

@sihbher sihbher commented Mar 23, 2026

Summary

ChangeDetails.md

12 fixes resolving end-to-end azd up failures in ailz-integrated mode. All changes are backward-compatible with basic mode.


Provisioning Fixes (azd provision)

# Fix Root Cause Details
1 LogAnalytics CustomerId must be a GUID ARM resource ID passed instead of workspace GUID ChangeDetails.md §1
2 Typo: privateEndpointsSubnetIdprivateEndpointSubnetId Extra s in property name ChangeDetails.md §2
3 UnmatchedPrincipalType for deployer().objectId principalType hardcoded as 'User'; fails with managed identity ChangeDetails.md §3
4 ACR configuration simplification Removed shared-registry logic; each workload now owns its ACR ChangeDetails.md §4
5 Container Apps timeout — DNS zone groups not created AVM modules silently fail to create privateDnsZoneGroups; added explicit modules ChangeDetails.md §5
6 App Configuration keys fail with 403 in AILZ mode ARM deployment service cannot reach private App Config; moved key creation to postprovision hook ChangeDetails.md §6
7 STORAGE_ACCOUNT_NAME output returns module name storageAccount.name returns deployment name; fixed to storageAccount.outputs.name ChangeDetails.md §7
8 Queue storage DNS zone missing in AILZ resource group AILZ infra only had blob zone; made queue DNS zone group conditional pending admin action ChangeDetails.md §8

Deploy Fixes (azd deploy)

# Fix Root Cause Details
9 remoteBuild: true incompatible with private ACR ACR Tasks use public IPs, blocked by publicNetworkAccess: Disabled; removed remoteBuild ChangeDetails.md §9
10 ACR pull fails on fresh environment RBAC propagation delay between provision and deploy; added predeploy hook with 3-phase readiness gate ChangeDetails.md §10
11 Container Apps return 404 from App Gateway / JumpBox externalIngress: false in internal CAE blocks all non-CAE VNet traffic; set to true ChangeDetails.md §11
12 Queue PE has no DNS A record — worker crashes existingQueuePrivateDnsZoneId never reached Bicep (missing parameters.json mapping + storage module param); closes the parameter flow chain started in item 8 ChangeDetails.md §12

Files Changed

  • azure.yaml — removed remoteBuild, added predeploy hook
  • infra/bicep/main.bicep — 10+ fixes across validation, storage, DNS zone groups, Container Apps ingress
  • infra/bicep/main.parameters.json — added existingQueuePrivateDnsZoneId mapping
  • infra/bicep/modules/storage.bicep — fixed output name to use outputs.name
  • infra/bicep/modules/storage.bicep, app-config-store.bicep — dynamic principalType detection
  • infra/bicep/modules/private-endpoint-dns-zone-group.bicepnew: reusable DNS zone group module
  • infra/scripts/pre-provision.sh — added Docker daemon reachability check
  • infra/scripts/pre-deploy.sh — rewrote with ACR pull readiness gate (RBAC polling + DNS check + stabilization wait)
  • infra/scripts/post-provision.sh — added App Config key creation in AILZ mode
  • infra/README.md — updated deployment instructions for both modes

Deployment Impact

Before

azd up in ailz-integrated mode failed at multiple stages:

  • Provision: errors on LogAnalytics GUID, property typos, UnmatchedPrincipalType, silent AVM DNS zone group failures, 403 on App Config keys, wrong storage output name, missing queue DNS zone.
  • Deploy: remoteBuild blocked by private ACR, ACR pull failure due to RBAC propagation delay, Container Apps returning 404 from JumpBox / App Gateway.
  • Runtime: worker crashed immediately — queue private endpoint had no DNS A record.

After

azd up completes end-to-end in both basic and ailz-integrated modes:

  • All provisioning steps succeed without manual intervention.
  • ACR image build and pull work reliably on fresh environments.
  • API, Worker, and Web are reachable from App Gateway and JumpBox.
  • Worker resolves the queue storage endpoint and processes messages normally.

Testing

Verified

  • azd up (provision + deploy) from scratch on test006 and test007 in ailz-integrated mode.
  • All 3 Container Apps (API, Worker, Web) reachable from JumpBox VM via internal URL.
  • App Gateway forwarding traffic to Container Apps (HTTP 200).
  • Worker dequeuing and processing messages — no DNS resolution errors in logs.
  • basic mode azd up re-validated — no regressions introduced.

Related Documentation

  • ChangeDetails.md — fix-by-fix breakdown (12 items) with root cause analysis, before/after code, and impact per deployment mode.
  • infra/README.md — updated deployment instructions for both basic and ailz-integrated modes.

Breaking Changes

None for basic mode.

For ailz-integrated mode:

  • existingQueuePrivateDnsZoneId is now a recognized parameter in main.parameters.json. If the AILZ resource group already has a privatelink.queue.core.windows.net DNS zone, set this value before running azd provision to enable the DNS zone group for the queue private endpoint. If left empty, provisioning still succeeds — the DNS zone group is simply skipped until the zone is available.

Notes

  • Item 8 (queue DNS zone) requires the AILZ administrator to create privatelink.queue.core.windows.net in the AILZ resource group. Item 12 ensures the parameter flows correctly once the zone exists.
  • The predeploy readiness gate (item 10) is configurable: ACR_RBAC_TIMEOUT and ACR_RBAC_STABILIZATION env vars.

sihbher added 30 commits March 18, 2026 12:57
Prevents compilation error in basic mode where networkConfig.privateDnsZoneIds does not exist
- Remove ACR reuse references (always creates new ACR per workload)
- Add Queue Storage DNS Zone to all relevant sections
- Add troubleshooting section for missing Queue Storage DNS Zone
- Add important note about Queue DNS Zone requirement in AILZ mode
- Update manual configuration steps with Queue DNS Zone ID
- Fix inconsistent ACR descriptions in deployment sections
- Ensure all documentation aligns with changes.md Items 4 and 8
@sihbher sihbher changed the title fix(infra): AILZ deployment fixes - private endpoints, DNS, and authentication fix(infra): AILZ deployment fixes - private endpoints, DNS, authentication , and ingress Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant