Skip to content

IRSA broken on v2.7.x: AWS_WEB_IDENTITY_TOKEN_FILE no longer read by S3 client #1646

@erikcw

Description

@erikcw

Summary

After upgrading from v2.6.6 to v2.7.2 on EKS with IRSA, Parseable can no longer assume the service account's IAM role. The S3 client falls through to the EC2 instance role (Karpenter node role in our case), which lacks bucket permissions, and every S3 call returns 403 AccessDenied.

The same Helm values and same ServiceAccount/IRSA setup work on v2.6.6.

Environment

  • Parseable: chart 2.7.2, image parseable/parseable:v2.7.2
  • Mode: Distributed (querier + ingestor StatefulSets), P_MODE=query / P_MODE=ingest
  • Store: s3-store, AWS S3 in us-east-1
  • EKS, IRSA via OIDC. ServiceAccount has eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/<role>.
  • Pod has projected SA token at /var/run/secrets/eks.amazonaws.com/serviceaccount/token, audience sts.amazonaws.com.
  • Pod env includes AWS_ROLE_ARN, AWS_WEB_IDENTITY_TOKEN_FILE, AWS_STS_REGIONAL_ENDPOINTS=regional, AWS_REGION=us-east-1.
  • P_S3_ACCESS_KEY / P_S3_SECRET_KEY are intentionally NOT set; we rely on IRSA.

Symptoms

querier pod crashloops at startup with:

Error: MetastoreError: MetastoreErrorDetail {
  operation: "ObjectStorageError",
  message: "Unhandled Error: The operation lacked the necessary privileges to complete
  for path .parseable/.parseable.json: Error performing GET
  https://s3.us-east-1.amazonaws.com/<bucket>/.parseable/.parseable.json in 53ms
  - Server returned non-2xx status code: 403 Forbidden:
  <Error><Code>AccessDenied</Code><Message>User:
  arn:aws:sts::ACCOUNT:assumed-role/KarpenterNodeRole-<cluster>/<instance-id>
  is not authorized to perform: s3:GetObject ... because no identity-based policy
  allows the s3:GetObject action</Message>...
}

Key tells:

  • The User is the EC2 node role, not the IRSA role.
  • 53ms response time -- no STS:AssumeRoleWithWebIdentity round trip happened. The SDK went straight to IMDS.
  • Rolling back to v2.6.6 with identical pod/SA/role config restores IRSA immediately.

Root cause

The object_store crate bumped from 0.12.4 -> 0.13.1 between v2.6.6 and v2.7.0 (chore: update datafusion and related crates, #1635). The build-time credential resolution semantics for IRSA changed.

object_store v0.12.4 AmazonS3Builder::build():

} else if let (Ok(token_path), Ok(role_arn)) = (
    std::env::var("AWS_WEB_IDENTITY_TOKEN_FILE"),
    std::env::var("AWS_ROLE_ARN"),
) {
    debug!("Using WebIdentity credential provider");
    ...
}

Reads env vars directly. IRSA works without any consumer action.

object_store v0.13.0/0.13.1 AmazonS3Builder::build():

} else if let (Some(token_path), Some(role_arn)) =
    (self.web_identity_token_file, self.role_arn)
{
    debug!("Using WebIdentity credential provider");
    ...
}

Now checks builder struct fields. These fields are populated only by AmazonS3Builder::from_env() or by explicit with_config(AmazonS3ConfigKey::WebIdentityTokenFile, ...) and with_config(AmazonS3ConfigKey::RoleArn, ...) calls.

Parseable's src/storage/s3.rs::S3Config::get_default_builder (v2.7.2) uses:

let mut builder = AmazonS3Builder::new()
    .with_region(&self.region)
    .with_endpoint(&self.endpoint_url)
    ...

It never calls from_env() and never sets WebIdentityTokenFile / RoleArn via with_config. The only env-derived config it does set is AWS_CONTAINER_CREDENTIALS_RELATIVE_URI (ECS task role, not relevant on EKS).

Result: in v2.7.x the WebIdentity branch is unreachable; the chain falls through to InstanceCredentialProvider, which is the EC2 node role.

Suggested fix

In src/storage/s3.rs::S3Config::get_default_builder, when access/secret keys are absent, populate WebIdentityTokenFile and RoleArn from env. For example:

if self.access_key_id.is_none() && self.secret_key.is_none() {
    if let Ok(token_file) = std::env::var("AWS_WEB_IDENTITY_TOKEN_FILE") {
        builder = builder.with_config(
            AmazonS3ConfigKey::WebIdentityTokenFile, token_file,
        );
    }
    if let Ok(role_arn) = std::env::var("AWS_ROLE_ARN") {
        builder = builder.with_config(AmazonS3ConfigKey::RoleArn, role_arn);
    }
    if let Ok(session) = std::env::var("AWS_ROLE_SESSION_NAME") {
        builder = builder.with_config(AmazonS3ConfigKey::RoleSessionName, session);
    }
}

AWS_ROLE_SESSION_NAME is optional (defaults to WebIdentitySession). AWS_ENDPOINT_URL_STS is also optional. Symmetric handling for the EKS Pod Identity env pair (AWS_CONTAINER_CREDENTIALS_FULL_URI + AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE) would also be nice but is a separate enhancement.

An alternative would be calling AmazonS3Builder::from_env() as the seed and then layering explicit with_*() calls on top, but that picks up every AWS_* env var the deployer happens to have set, which is probably broader behavior than you want.

Workaround

Set P_S3_ACCESS_KEY / P_S3_SECRET_KEY to a static IAM user's keys. Forfeits IRSA's no-static-secret guarantee but unblocks v2.7.x deployments today.

Affected versions

v2.7.0, v2.7.1, v2.7.2 (all releases after the object_store 0.13 bump). v2.6.6 and earlier unaffected.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions