Skip to content

AMQP Context Tracking connection error handling#2106

Open
sanikache wants to merge 5 commits into
AcademySoftwareFoundation:mainfrom
dreamworksanimation:issue_1968
Open

AMQP Context Tracking connection error handling#2106
sanikache wants to merge 5 commits into
AcademySoftwareFoundation:mainfrom
dreamworksanimation:issue_1968

Conversation

@sanikache
Copy link
Copy Markdown
Contributor

@JeanChristopheMorinPerso and I were discussing this over #1968

Pika's vendor library was emitting ERROR-level log messages whenever a broker connection failed, even when context tracking was not configured. This produced spurious output in normal rez usage.

Changes:

  • init.py: Set rez.vendor.pika logger to CRITICAL at startup, silencing all pika output by default.

  • amqp.py:

    • Only call set_pika_log_level() when debug_context_tracking is enabled, rather than unconditionally on every publish attempt.
    • Also catch AMQPConnectionError (in addition to socket.error) on connection failure pika can raise either depending on the failure mode.
    • Downgrade connection failure message from ERROR to WARNING when context_tracking_host is explicitly set (unexpected failure), or to DEBUG when it is not (normal/expected when tracking is not configured).
    • Remove the dead else branch in set_pika_log_level() that reset to WARNING and the CRITICAL default is now set centrally in init.py.
    • rezconfig.py: Document the warning/debug behaviour on broker unreachability, and note that debug_context_tracking also enables pika DEBUG logging.

Here's the new behavior with this change after pointing context_tracking_host to a fake host:

rez-env dev_build
13:11:39 WARNING  Cannot connect to the message broker: [Errno -2] Name or service not known

To enabling context_tracking to see the full stacktrace:

setenv REZ_DEBUG_CONTEXT_TRACKING 1
or
set debug_context_tracking to True in rezconfig. 
Exception in thread Thread-1 (_publish_messages_async):
Traceback (most recent call last):
  File "/mmfs1/sasha/pipeline/rez_packages/python/3.11.12.a1/platform-linux/arch-x86_64/os-Rocky-8.7/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/mmfs1/sasha/pipeline/rez_packages/python/3.11.12.a1/platform-linux/arch-x86_64/os-Rocky-8.7/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/mmfs1/sasha/home/ben.hawkyard/rez/lib/python3.11/site-packages/rez/utils/amqp.py", line 133, in _publish_messages_async
    _publish_message(**kwargs)
  File "/mmfs1/sasha/home/ben.hawkyard/rez/lib/python3.11/site-packages/rez/utils/amqp.py", line 103, in _publish_message
    conn = BlockingConnection(params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mmfs1/sasha/home/ben.hawkyard/rez/lib/python3.11/site-packages/rez/vendor/pika/adapters/blocking_connection.py", line 361, in __init__
    self._impl = self._create_connection(parameters, _impl_class)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mmfs1/sasha/home/ben.hawkyard/rez/lib/python3.11/site-packages/rez/vendor/pika/adapters/blocking_connection.py", line 452, in _create_connection
    raise self._reap_last_connection_workflow_error(error)
rez.vendor.pika.exceptions.AMQPConnectionError

Signed-off-by: Ibrahim Sani <Ibrahim.Sani@dreamworks.com>
@sanikache sanikache requested a review from a team as a code owner May 14, 2026 21:41
@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.86%. Comparing base (d415b96) to head (d17ae83).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2106      +/-   ##
==========================================
+ Coverage   60.65%   60.86%   +0.21%     
==========================================
  Files         164      164              
  Lines       20584    20590       +6     
  Branches     3579     3582       +3     
==========================================
+ Hits        12485    12532      +47     
+ Misses       7224     7184      -40     
+ Partials      875      874       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maxnbk maxnbk added the devdays26 ASWF Dev Days 2026 label May 14, 2026
@maxnbk
Copy link
Copy Markdown
Contributor

maxnbk commented May 14, 2026

I like the change, it seems solid overall, but I will just want to wait for @JeanChristopheMorinPerso to comment.

@sanikache : Since the cov-bot is basically mostly complaining that nothing hits inside the logging-conf block, I'm wondering if a mocked test-module could hit this codepath without needing an actual AMQP server to hit?

@sanikache
Copy link
Copy Markdown
Contributor Author

I somehow missed your comment yesterday. That makes sense. I can add a new src/rez/tests/test_amqp.py since we don't have one for AMQP.

Signed-off-by: Ibrahim Sani <Ibrahim.Sani@dreamworks.com>
@sanikache
Copy link
Copy Markdown
Contributor Author

While running the AMQP tests, I got the following failure which required that additional change you see in src/rez/utils/amqp.py

rez-selftest --amqp
Creating python package in '/pic1/workspace/__tests_pkg_repo'
12:29:12 WARNING  Cannot connect to the message broker: broker refused
12:29:12 ERROR    Failed to publish message: publish error
12:29:12 WARNING  Cannot connect to the message broker: connection refused
Published to REZ.CONTEXT: {'action': 'created'}
======================================================================
FAIL: test_host_and_port (rez.tests.test_amqp.TestParseHostAndPort.test_host_and_port)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/rez/3.3.0/platform-linux/arch-x86_64/lib/python3.11/site-packages/rez/tests/test_amqp.py", line 30, in test_host_and_port
    self.assertEqual(host, "localhost")
AssertionError: None != 'localhost'

----------------------------------------------------------------------
Ran 19 tests in 0.021s

FAILED (failures=1)

@maxnbk
Copy link
Copy Markdown
Contributor

maxnbk commented May 15, 2026

That was more of a test than I expected, but I like it.
I will mark 👍 but still wait for @JeanChristopheMorinPerso review since he was involved in discussions.

Comment thread src/rez/__init__.py Outdated
# their logging config, which will leave level != NOTSET here.
pika_logger = logging.getLogger("rez.vendor.pika")
if pika_logger.level == logging.NOTSET:
pika_logger.setLevel(logging.CRITICAL)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should do that. I feel like if someone provides a logger config file, they are responsible to configure the whole tree. Opinions?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me.

Signed-off-by: Ibrahim Sani <Ibrahim.Sani@dreamworks.com>
@sanikache
Copy link
Copy Markdown
Contributor Author

Sorry for the late response. I realized I hadn’t pushed the last requested change as I thought, but I’ve just done it now.

Copy link
Copy Markdown
Member

@JeanChristopheMorinPerso JeanChristopheMorinPerso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are almost there. I think some tests modify the global state, which will impact other tests negatively. This should be addressed before we can merge this PR.

Comment thread src/rez/__init__.py Outdated
def _init_logging() -> None:
logging_conf = os.getenv("REZ_LOGGING_CONF")
if logging_conf:
import logging
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import logging


def test_sets_debug_when_context_tracking_debug_on(self) -> None:
self.update_settings({"debug_context_tracking": True})
set_pika_log_level()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will persist/leak state across other tests. Maybe we could mock the getLogger function to avoid changing the global state?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! yes, that is correct.

Comment thread src/rez/tests/test_amqp.py Outdated
pika_logger.setLevel(logging.NOTSET)

with patch.dict(os.environ, {"REZ_LOGGING_CONF": conf_path}):
_init_logging()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, this changes the global state? If so, we should make sure that other tests don't get affected by it.

sanikache added 2 commits May 20, 2026 16:11
Signed-off-by: Ibrahim Sani <Ibrahim.Sani@dreamworks.com>
Signed-off-by: Ibrahim Sani <Ibrahim.Sani@dreamworks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devdays26 ASWF Dev Days 2026

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants