Skip to content

Build: [AEA-0000] - Add retry logic to the deploy-api stage#2986

Closed
wildjames wants to merge 1 commit intomainfrom
aea-0000-retry-deploy-api
Closed

Build: [AEA-0000] - Add retry logic to the deploy-api stage#2986
wildjames wants to merge 1 commit intomainfrom
aea-0000-retry-deploy-api

Conversation

@wildjames
Copy link
Copy Markdown
Contributor

Summary

  • 🤖 Operational or Infrastructure Change

Details

Occasionally, the deploy_api.sh script fails to get credentials and produces a 401 error. For example, here. Typically, manually re-running the workflow fixes this.

This is a stab in the dark, but I've added a retrying function to the script, that will take three attempts at talking to the lambdas (secret put, instance put, and spec publish). The error is transient and I don't know what triggers it (though I think it's most likely to be external) so I don't have evidence that this will work, but it shouldn't hurt.

Copilot AI review requested due to automatic review settings April 22, 2026 10:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds retry logic around Lambda invocations in the deploy_api.sh GitHub Actions deploy stage to reduce transient deployment failures (e.g., intermittent 401s), while also cleaning up a couple of formatting issues.

Changes:

  • Introduced invoke_lambda_with_retry() to retry Lambda invocations up to 3 times with a delay.
  • Replaced direct aws lambda invoke calls (secret put, instance put, spec publish) with the retry wrapper.
  • Minor whitespace/formatting fixes in the script.

Comment on lines +9 to +19

for attempt in $(seq 1 "$max_attempts"); do
echo "Invoking ${function_name} (attempt ${attempt} of ${max_attempts})"
aws lambda invoke --function-name "${function_name}" --cli-binary-format raw-in-base64-out --payload "file://${payload_file}" out.txt > response.json

if ! eval "cat response.json | jq -e '.FunctionError' >/dev/null"; then
return 0
fi

echo "Error calling lambda (attempt ${attempt} of ${max_attempts})"
cat out.txt
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out.txt and response.json are overwritten on every attempt and on every lambda call, which makes post-failure debugging harder (especially with retries). Consider incorporating the function name/attempt into the filenames, or at least printing both the lambda payload output and the invoke metadata for each failed attempt.

Suggested change
for attempt in $(seq 1 "$max_attempts"); do
echo "Invoking ${function_name} (attempt ${attempt} of ${max_attempts})"
aws lambda invoke --function-name "${function_name}" --cli-binary-format raw-in-base64-out --payload "file://${payload_file}" out.txt > response.json
if ! eval "cat response.json | jq -e '.FunctionError' >/dev/null"; then
return 0
fi
echo "Error calling lambda (attempt ${attempt} of ${max_attempts})"
cat out.txt
local safe_function_name
safe_function_name=$(printf '%s' "${function_name}" | tr -c '[:alnum:]._-' '_')
for attempt in $(seq 1 "$max_attempts"); do
local payload_output_file="${safe_function_name}.attempt_${attempt}.out.txt"
local invoke_metadata_file="${safe_function_name}.attempt_${attempt}.response.json"
echo "Invoking ${function_name} (attempt ${attempt} of ${max_attempts})"
aws lambda invoke --function-name "${function_name}" --cli-binary-format raw-in-base64-out --payload "file://${payload_file}" "${payload_output_file}" > "${invoke_metadata_file}"
if ! jq -e '.FunctionError' "${invoke_metadata_file}" >/dev/null; then
return 0
fi
echo "Error calling lambda (attempt ${attempt} of ${max_attempts})"
echo "Lambda payload output (${payload_output_file}):"
cat "${payload_output_file}"
echo "Lambda invoke metadata (${invoke_metadata_file}):"
cat "${invoke_metadata_file}"

Copilot uses AI. Check for mistakes.
Comment thread .github/scripts/deploy_api.sh
Comment thread .github/scripts/deploy_api.sh
@@ -138,12 +165,7 @@ if [[ "${ENABLE_MUTUAL_TLS}" == "true" ]]; then
--arg proxygenSecretName "${proxygen_private_key_arn}" \
'{apiName: $apiName, environment: $environment, secretName: $secretName, secretKey: $secretKey, secretCert: $secretCert, kid, $kid, proxygenSecretName: $proxygenSecretName}' > payload.json
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jq payload object contains kid, $kid, which is not valid key/value construction and will either set kid to null or cause a jq syntax error. This should be a proper field assignment (e.g. kid: $kid) so the lambda receives the expected KID value.

Suggested change
'{apiName: $apiName, environment: $environment, secretName: $secretName, secretKey: $secretKey, secretCert: $secretCert, kid, $kid, proxygenSecretName: $proxygenSecretName}' > payload.json
'{apiName: $apiName, environment: $environment, secretName: $secretName, secretKey: $secretKey, secretCert: $secretCert, kid: $kid, proxygenSecretName: $proxygenSecretName}' > payload.json

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

god damn, that might be the problem. Why does this work most of the time, then?

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants