Skip to content

sync engine#890

Open
frostbyte73 wants to merge 17 commits intomainfrom
sync-engine
Open

sync engine#890
frostbyte73 wants to merge 17 commits intomainfrom
sync-engine

Conversation

@frostbyte73
Copy link
Copy Markdown
Member

New Synchronizer based on Chrome's RtpToNtpEstimator, StreamSynchronization, and RtpStreamsSynchronizer

  • NtpEstimator: Per-track linear regression (OLS over last 20 RTCP Sender Reports) mapping RTP timestamps to NTP wall-clock time.
  • ParticipantSync — Compares NTP-mapped times across one participant's audio and video tracks to compute A/V lip sync adjustments, with time-based slew rate limiting (5ms/s) and a 5ms deadband.
  • SessionTimeline — Maps each participant's independent NTP clock domain onto a shared recording timeline using one-way delay estimation to normalize out clock offsets between participants.
  • SyncEngine — Orchestrator that wires the three layers together, handles wall-clock fallback before SRs arrive, manages the transition to NTP-grounded PTS, and enforces monotonicity and drain. Implements the Sync/TrackSync interfaces so AppWriter is agnostic to which engine is active.

Copy link
Copy Markdown
Contributor

@boks1971 boks1971 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not able to read it all, looks good from what I could read

@milos-lk
Copy link
Copy Markdown
Contributor

started looking into this today - I need to set aside couple of hours tomorrow to review it in depth.

@milos-lk milos-lk self-assigned this Apr 24, 2026
Comment thread pkg/synchronizer/syncengine.go Outdated
Comment thread pkg/synchronizer/syncengine.go
@milos-lk
Copy link
Copy Markdown
Contributor

The old sync drops packets older than oldPacketThreshold. The new sync has no equivalent. Very old packets coming in a burst could fill in gst queues and slow down burst recovery. Is there any reason not to have it inside the new sync?

@milos-lk
Copy link
Copy Markdown
Contributor

Similar to the note above - new sync doesn't integrate pipeline time feedback (existing normalizePTSToMediaPipelineTimeline) so that workaround we built looks like not used in the new solution? media running time is set though..

@milos-lk
Copy link
Copy Markdown
Contributor

Synchronization related bugs were the hardest class of bugs I have seen for egress in the previous period. I think we need to have good logging around this also in the new logic. Adjustments performed, stats etc. The existing syncrhonizer code has decent amount of logs which came to be through the pain of debugging sync issues.

Comment thread pkg/synchronizer/syncengine.go Outdated
Comment thread pkg/synchronizer/syncengine.go Outdated
@milos-lk
Copy link
Copy Markdown
Contributor

Maybe it's worth calling out that some of the options being in the current sync logic aren't supported - like disabling PTS adjustments - I guess we can use existing synchronizer for cases where this is needed

@milos-lk
Copy link
Copy Markdown
Contributor

Pretty neat idea of using both NtpEsitmator and OWD estimator to map everything through session timeline - I like that 👍

@frostbyte73
Copy link
Copy Markdown
Member Author

Maybe it's worth calling out that some of the options being in the current sync logic aren't supported - like disabling PTS adjustments - I guess we can use existing synchronizer for cases where this is needed

My hope was that if we just ripped all the logic out of chrome, we wouldn't need those options. Doing this in parallel with the sdk compositing so we have a good way to test/measure other than just deploy to canary and hope

@frostbyte73 frostbyte73 requested a review from boks1971 April 24, 2026 16:20
@frostbyte73 frostbyte73 marked this pull request as draft April 24, 2026 16:20
@milos-lk
Copy link
Copy Markdown
Contributor

Maybe it's worth calling out that some of the options being in the current sync logic aren't supported - like disabling PTS adjustments - I guess we can use existing synchronizer for cases where this is needed

My hope was that if we just ripped all the logic out of chrome, we wouldn't need those options. Doing this in parallel with the sdk compositing so we have a good way to test/measure other than just deploy to canary and hope

👍 definitively nice for side by side comparison. I am afraid though we might need it, but let's see 🤞 . A lot of straightening for chrome is happening on DSP level, inside libwebrtc with NetEq which is what e.g fixes audio timeline with changing tempo and not touching PTS (more complex and probably robust logic than what I built with pitch audio tempo control.).

@frostbyte73 frostbyte73 marked this pull request as ready for review April 24, 2026 20:58
@frostbyte73
Copy link
Copy Markdown
Member Author

Integration tests all passing livekit/egress#1198

@boks1971
Copy link
Copy Markdown
Contributor

Sorry @frostbyte73 . Backlogged on this. Will try to read through this today.

Comment thread pkg/synchronizer/ntpestimator.go
Copy link
Copy Markdown
Contributor

@boks1971 boks1971 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff @frostbyte73 ! Left some comments.

I am not able to check on every bit as there are a lot of scars from older battles. I think @milos-lk caught some of those and you have included it all. Guess, this will require good chunk of testing + good amount of soak in staging.

Comment thread pkg/synchronizer/participantsync.go Outdated
Comment thread pkg/synchronizer/participantsync.go Outdated
Comment thread pkg/synchronizer/ntpestimator.go Outdated
Comment thread pkg/synchronizer/ntpestimator.go Outdated
Comment thread pkg/synchronizer/ntpestimator.go
Comment thread pkg/synchronizer/ntpestimator_test.go
Comment thread pkg/synchronizer/sessiontimeline.go Outdated
Comment thread pkg/synchronizer/sessiontimeline.go Outdated
Comment thread pkg/synchronizer/sessiontimeline.go
Comment thread pkg/synchronizer/syncengine.go
Copy link
Copy Markdown
Contributor

@milos-lk milos-lk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to update drift value for the case when we don't touch audio PTS as we discussed yesterday - but other than that 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants