Describe the bug
We are observing sustained PostgreSQL lock contention (“blocked connections / lock waits”) caused by concurrent writes from the Unleash server to the client_applications table. During peak events, multiple sessions wait on wait_event_type='Lock' / wait_event='transactionid' for tens of seconds.
Using pg_blocking_pids(), the top blockers consistently show a batch INSERT into client_applications (app_name, seen_at, updated_at) (likely an upsert due to the PK on app_name). The problem becomes pronounced when 20+ pods register in parallel, for example during rollouts/restarts or autoscaling events.
Per Unleash docs, SDKs call POST /api/client/register on startup to register their existence (appName, strategies, SDK version, etc.).
Steps to reproduce the bug
- Deploy Unleash OSS 7.4.1 pointing to PostgreSQL.
- Run many client services/SDKs in Kubernetes with the same
appName (or multiple appNames) and high replica counts (e.g., 10–20+ pods per app).
- Trigger a high-concurrency event (e.g., rollout restart, mass redeploy, or HPA scale-up) so that 20+ pods register in parallel and hit
POST /api/client/register around the same time.
- Observe in PostgreSQL:
- increased sessions with
wait_event_type='Lock'
wait_event='transactionid'
- blockers with query
insert into "client_applications"...
- (Optional) Confirm in Unleash UI
Project → Applications that some applications report high “instances” counts.
Expected behavior
Under high instance count (many pods/SDKs), client registration should not cause sustained DB lock contention or long lock waits (tens of seconds). Ideally, the system should debounce/batch these writes and/or minimize row-level conflicts when many instances register concurrently.
Logs, error output, etc.
### PostgreSQL: constraint (PK) on `client_applications`
SELECT conname, contype, pg_get_constraintdef(oid) AS def
FROM pg_constraint
WHERE conrelid = 'client_applications'::regclass;
Expected output:
- `client_applications_pkey` PRIMARY KEY (`app_name`)
### PostgreSQL: lock summary (snapshot)
SELECT
now() AS ts,
count(*) AS waiting_sessions,
max(now() - query_start) AS max_wait,
avg(extract(epoch FROM (now() - query_start)))::int AS avg_wait_seconds
FROM pg_stat_activity
WHERE wait_event_type = 'Lock';
Example observed during incidents:
- `waiting_sessions`: ~15–17
- `max_wait`: ~25–47s
- `avg_wait_seconds`: ~8–12s
### PostgreSQL: top blockers (confirms blocking query)
SELECT
now() AS ts,
b.pid AS blocker_pid,
b.client_addr,
b.application_name,
now() - b.xact_start AS blocker_xact_age,
now() - b.query_start AS blocker_query_age,
b.wait_event_type,
b.wait_event,
left(b.query, 200) AS blocker_query,
count(*) AS blocked_sessions
FROM pg_stat_activity a
JOIN pg_stat_activity b
ON b.pid = ANY (pg_blocking_pids(a.pid))
WHERE a.wait_event_type = 'Lock'
GROUP BY 1,2,3,4,5,6,7,8,9
ORDER BY blocked_sessions DESC, blocker_xact_age DESC;
Example (truncated):
- `insert into "client_applications" ("app_name","seen_at","updated_at") values (...),(...),...`
- `wait_event='transactionid'`
- one blocker can block ~10–12 sessions
### Note on `relation = n/a` in lock breakdowns
When analyzing locks via `pg_locks.relation`, `relation` may appear as `NULL` (“n/a”) because `transactionid`-based waits do not necessarily map to a relation OID in `pg_locks.relation`.
Screenshots
No response
Additional context
- Unleash version: 7.4.1 (Open Source)
- Hosting: self-hosted on Kubernetes
- Database: PostgreSQL (managed)
- Traffic pattern: multiple client services/SDKs; 20+ pods registering in parallel during rollouts/restarts/autoscaling.
- UI correlation: in
Project → Applications, some apps show high “instances” counts (10–20+), correlating with increased registration concurrency.
Questions for maintainers
- Is this pattern (transactionid lock waits caused by concurrent
client_applications inserts/upserts) a known issue at scale?
- Are there recommended settings/patterns for high-scale Kubernetes deployments to avoid sustained contention?
- e.g., server-side debouncing/batching for registration writes, different upsert strategy, async persistence, etc.
- Any recommended configuration for:
- DB pool sizing (
DATABASE_POOL_MIN/MAX)
- rate limiting for
/api/client/register
- reducing registration write pressure without losing critical visibility?
What we can provide if needed
- Full (untruncated)
blocker_query text
- Time series of
waiting_sessions / max_wait
- Unleash replica count and DB pool configuration
- Ingress counts for
POST /api/client/register during incident windows
Unleash version
7.4.1 (Open Source)
Subscription type
Open source
Hosting type
Self-hosted
SDK information (language and version)
Node.js unleash-client ^6.9.4
Describe the bug
We are observing sustained PostgreSQL lock contention (“blocked connections / lock waits”) caused by concurrent writes from the Unleash server to the
client_applicationstable. During peak events, multiple sessions wait onwait_event_type='Lock'/wait_event='transactionid'for tens of seconds.Using
pg_blocking_pids(), the top blockers consistently show a batchINSERTintoclient_applications (app_name, seen_at, updated_at)(likely an upsert due to the PK onapp_name). The problem becomes pronounced when 20+ pods register in parallel, for example during rollouts/restarts or autoscaling events.Per Unleash docs, SDKs call
POST /api/client/registeron startup to register their existence (appName, strategies, SDK version, etc.).Steps to reproduce the bug
appName(or multipleappNames) and high replica counts (e.g., 10–20+ pods per app).POST /api/client/registeraround the same time.wait_event_type='Lock'wait_event='transactionid'insert into "client_applications"...Project → Applicationsthat some applications report high “instances” counts.Expected behavior
Under high instance count (many pods/SDKs), client registration should not cause sustained DB lock contention or long lock waits (tens of seconds). Ideally, the system should debounce/batch these writes and/or minimize row-level conflicts when many instances register concurrently.
Logs, error output, etc.
Screenshots
No response
Additional context
Project → Applications, some apps show high “instances” counts (10–20+), correlating with increased registration concurrency.Questions for maintainers
client_applicationsinserts/upserts) a known issue at scale?DATABASE_POOL_MIN/MAX)/api/client/registerWhat we can provide if needed
blocker_querytextwaiting_sessions/max_waitPOST /api/client/registerduring incident windowsUnleash version
7.4.1 (Open Source)
Subscription type
Open source
Hosting type
Self-hosted
SDK information (language and version)
Node.js unleash-client ^6.9.4