Build: [AEA-0000] - Add retry logic to the deploy-api stage by wildjames · Pull Request #2986 · NHSDigital/eps-prescription-status-update-api

wildjames · 2026-04-22T10:13:34Z

Summary

🤖 Operational or Infrastructure Change

Details

Occasionally, the deploy_api.sh script fails to get credentials and produces a 401 error. For example, here. Typically, manually re-running the workflow fixes this.

This is a stab in the dark, but I've added a retrying function to the script, that will take three attempts at talking to the lambdas (secret put, instance put, and spec publish). The error is transient and I don't know what triggers it (though I think it's most likely to be external) so I don't have evidence that this will work, but it shouldn't hurt.

Copilot

Pull request overview

Adds retry logic around Lambda invocations in the deploy_api.sh GitHub Actions deploy stage to reduce transient deployment failures (e.g., intermittent 401s), while also cleaning up a couple of formatting issues.

Changes:

Introduced invoke_lambda_with_retry() to retry Lambda invocations up to 3 times with a delay.
Replaced direct aws lambda invoke calls (secret put, instance put, spec publish) with the retry wrapper.
Minor whitespace/formatting fixes in the script.

Copilot · 2026-04-22T10:17:36Z

+
+    for attempt in $(seq 1 "$max_attempts"); do
+        echo "Invoking ${function_name} (attempt ${attempt} of ${max_attempts})"
+        aws lambda invoke --function-name "${function_name}" --cli-binary-format raw-in-base64-out --payload "file://${payload_file}" out.txt > response.json
+
+        if ! eval "cat response.json | jq -e '.FunctionError' >/dev/null"; then
+            return 0
+        fi
+
+        echo "Error calling lambda (attempt ${attempt} of ${max_attempts})"
+        cat out.txt


out.txt and response.json are overwritten on every attempt and on every lambda call, which makes post-failure debugging harder (especially with retries). Consider incorporating the function name/attempt into the filenames, or at least printing both the lambda payload output and the invoke metadata for each failed attempt.

Suggested change

for attempt in $(seq 1 "$max_attempts"); do

echo "Invoking ${function_name} (attempt ${attempt} of ${max_attempts})"

aws lambda invoke --function-name "${function_name}" --cli-binary-format raw-in-base64-out --payload "file://${payload_file}" out.txt > response.json

if ! eval "cat response.json | jq -e '.FunctionError' >/dev/null"; then

return 0

fi

echo "Error calling lambda (attempt ${attempt} of ${max_attempts})"

cat out.txt

local safe_function_name

safe_function_name=$(printf '%s' "${function_name}" | tr -c '[:alnum:]._-' '_')

for attempt in $(seq 1 "$max_attempts"); do

local payload_output_file="${safe_function_name}.attempt_${attempt}.out.txt"

local invoke_metadata_file="${safe_function_name}.attempt_${attempt}.response.json"

echo "Invoking ${function_name} (attempt ${attempt} of ${max_attempts})"

aws lambda invoke --function-name "${function_name}" --cli-binary-format raw-in-base64-out --payload "file://${payload_file}" "${payload_output_file}" > "${invoke_metadata_file}"

if ! jq -e '.FunctionError' "${invoke_metadata_file}" >/dev/null; then

return 0

fi

echo "Error calling lambda (attempt ${attempt} of ${max_attempts})"

echo "Lambda payload output (${payload_output_file}):"

cat "${payload_output_file}"

echo "Lambda invoke metadata (${invoke_metadata_file}):"

cat "${invoke_metadata_file}"

Copilot · 2026-04-22T10:17:37Z

@@ -138,12 +165,7 @@ if [[ "${ENABLE_MUTUAL_TLS}" == "true" ]]; then
            --arg proxygenSecretName "${proxygen_private_key_arn}" \
            '{apiName: $apiName, environment: $environment, secretName: $secretName, secretKey: $secretKey, secretCert: $secretCert, kid, $kid, proxygenSecretName: $proxygenSecretName}' > payload.json


The jq payload object contains kid, $kid, which is not valid key/value construction and will either set kid to null or cause a jq syntax error. This should be a proper field assignment (e.g. kid: $kid) so the lambda receives the expected KID value.

Suggested change

'{apiName: $apiName, environment: $environment, secretName: $secretName, secretKey: $secretKey, secretCert: $secretCert, kid, $kid, proxygenSecretName: $proxygenSecretName}' > payload.json

'{apiName: $apiName, environment: $environment, secretName: $secretName, secretKey: $secretKey, secretCert: $secretCert, kid: $kid, proxygenSecretName: $proxygenSecretName}' > payload.json

god damn, that might be the problem. Why does this work most of the time, then?

sonarqubecloud · 2026-04-22T10:18:46Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Add retry logic to the deploy-api stage

bf90232

Copilot AI review requested due to automatic review settings April 22, 2026 10:13

Copilot started reviewing on behalf of wildjames April 22, 2026 10:14 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

wildjames temporarily deployed to dev April 22, 2026 10:21 — with GitHub Actions Inactive

wildjames closed this Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build: [AEA-0000] - Add retry logic to the deploy-api stage#2986

Build: [AEA-0000] - Add retry logic to the deploy-api stage#2986
wildjames wants to merge 1 commit intomainfrom
aea-0000-retry-deploy-api

wildjames commented Apr 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

wildjames Apr 22, 2026

Uh oh!

sonarqubecloud Bot commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -138,12 +165,7 @@ if [[ "${ENABLE_MUTUAL_TLS}" == "true" ]]; then
		--arg proxygenSecretName "${proxygen_private_key_arn}" \
		'{apiName: $apiName, environment: $environment, secretName: $secretName, secretKey: $secretKey, secretCert: $secretCert, kid, $kid, proxygenSecretName: $proxygenSecretName}' > payload.json

Conversation

wildjames commented Apr 22, 2026

Summary

Details

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

wildjames Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Apr 22, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants