Skip to content

Fix Decouple 'dimensions' parameter in OpenAIEmbedder for third-party compatibility#7396

Open
panchaldhruv27223 wants to merge 9 commits intoagno-agi:mainfrom
panchaldhruv27223:fix-embedding-dimensions
Open

Fix Decouple 'dimensions' parameter in OpenAIEmbedder for third-party compatibility#7396
panchaldhruv27223 wants to merge 9 commits intoagno-agi:mainfrom
panchaldhruv27223:fix-embedding-dimensions

Conversation

@panchaldhruv27223
Copy link
Copy Markdown

Summary

This PR resolves an issue where OpenAIEmbedder and its subclasses (e.g., TogetherEmbedder, OpenAILikeEmbedder) would fail with HTTP 400 errors when interacting with OpenAI-compatible endpoints that do not support the dimensions parameter in their /v1/embeddings payload (e.g., Together AI, Groq, ...).

Key Changes & Architectural Improvements:

  • Added a safe toggle: Introduced send_dimensions: Optional[bool] = None to OpenAIEmbedder to act as a clean feature toggle.
  • Smart Auto-Detection: Updated __post_init__ to safely auto-detect this requirement. It defaults to True only if the model `startswith("text-embedding-3"). This preserves perfect backward compatibility for standard OpenAI V3 users while safely stripping the parameter by default for custom/third-party providers.
  • Removed Hardcoded Assumptions: Replaced the hardcoded if self.base_url is not None: checks across response, async_get_embedding, async_get_embedding_and_usage, and async_get_embeddings_batch_and_usage with the new safe toggle.
  • Fixed Together AI Default: Updated the default model in TogetherEmbedder from the deprecated togethercomputer/m2-bert-80M-32k-retrieval to nomic-ai/nomic-embed-text-v1.5 so the integration works out-of-the-box again.

Resolves #7395

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Improvement
  • Model update
  • Other:

Checklist

  • Code complies with style guidelines
  • Ran format/validation scripts (./scripts/format.sh and ./scripts/validate.sh)
  • Self-review completed
  • Documentation updated (comments, docstrings)
  • Examples and guides: Relevant cookbook examples have been included or updated (if applicable)
  • Tested in clean environment
  • Tests added/updated (if applicable)
  • I have tested these changes against the affected edge cases (Together AI, standard OpenAI V3, and custom proxies).
  • My changes do not break any existing functionality.

Duplicate and AI-Generated PR Check

  • I have searched existing open pull requests and confirmed that no other PR already addresses this issue
  • If a similar PR exists, I have explained below why this PR is a better approach
  • Check if this PR was entirely AI-generated (by Copilot, Claude Code, Cursor, etc.)

Additional Notes

Testing Details

  • Verified standard text-embedding-3-small still correctly sends the dimensions parameter to the native OpenAI endpoint.
  • Verified custom base_url with non-OpenAI models safely omits dimensions from the payload.
  • Verified TogetherEmbedder successfully connects and embeds without throwing HTTP 400 Bad Request errors.

DhruvPanchal and others added 2 commits April 7, 2026 17:04
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 7, 2026

PR Triage

Missing tests: This PR modifies source code but does not include any test changes. Please add or update tests to cover your changes.

Copy link
Copy Markdown
Member

@ysolanky ysolanky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

The core idea is right — adding send_dimensions as a toggle is a good approach. However, as written this PR is a breaking change for several existing subclasses. Here's an alternative that fixes the Together AI bug without changing behavior for anyone else.

Problem with the current auto-detection

The PR changes the auto-detect from:

# current (main)
self.id.startswith("text-embedding-3") or self.base_url is not None

to:

# this PR
self.id.startswith("text-embedding-3")

This silently changes behavior for FireworksEmbedder, NebiusEmbedder, and LangDBEmbedder — all of which have base_url set and currently do send dimensions. Whether those providers accept it or not, changing the default is a breaking change.

Suggested approach

Keep the existing auto-detection logic unchanged, and have TogetherEmbedder (and OpenAILikeEmbedder) explicitly opt out:

openai.py — preserve existing behavior as the default:

send_dimensions: Optional[bool] = None

def __post_init__(self):
    if self.dimensions is None:
        self.dimensions = 3072 if self.id == "text-embedding-3-large" else 1536
    if self.send_dimensions is None:
        self.send_dimensions = self.id.startswith("text-embedding-3") or self.base_url is not None

together.py — fix the actual bug:

send_dimensions: Optional[bool] = False

openai_like.py — most OpenAI-like providers don't support it:

send_dimensions: Optional[bool] = False

This way:

  • OpenAIEmbedder with text-embedding-3: sends dimensions (unchanged)
  • OpenAIEmbedder with custom base_url: sends dimensions (unchanged)
  • FireworksEmbedder, NebiusEmbedder, LangDBEmbedder: sends dimensions (unchanged)
  • TogetherEmbedder: does NOT send dimensions (bug fixed)
  • Users can always override with send_dimensions=True/False

Other notes

  1. The default model change for TogetherEmbedder (togethercomputer/m2-bert-80M-32k-retrievalintfloat/multilingual-e5-large-instruct) is a separate breaking change. The PR description mentions nomic-ai/nomic-embed-text-v1.5 but the code uses a different model. This should be split into its own PR.
  2. LangDBEmbedder overrides __post_init__ without calling super(), so with the current PR send_dimensions stays None (falsy) — silently breaking it. The suggested approach above avoids this since send_dimensions=None is only resolved in OpenAIEmbedder.__post_init__.
  3. Missing tests — at minimum, unit tests for the auto-detection logic would be valuable.

@panchaldhruv27223 panchaldhruv27223 force-pushed the fix-embedding-dimensions branch from 23bbd84 to 550952f Compare April 8, 2026 06:23
@panchaldhruv27223
Copy link
Copy Markdown
Author

@ysolanky Thank you for the incredibly thorough review! You were spot on about the downstream breaking changes - I really appreciate you catching the impact on Fireworks, Nebius, and that LangDBEmbedder OOP quirk.

I've pushed a new commit that implements your suggested approach exactly. Here is a breakdown of the updates:

  1. Preserved Backward Compatibility (openai.py): Reverted the post_init auto-detect logic back to self.id.startswith("text-embedding-3") or self.base_url is not None. FireworksEmbedder, NebiusEmbedder, and LangDBEmbedder will continue to send dimensions exactly as they always have.

  2. Explicit Opt-Outs: Added send_dimensions: Optional[bool] = False directly to TogetherEmbedder and OpenAILikeEmbedder. This cleanly fixes the HTTP 400 bug for these providers without altering the parent class behavior.

  3. LangDB Fix (langdb.py): Added the missing super().post_init() call so it correctly inherits the parent's auto-detection logic.

  4. Narrowed PR Scope (together.py): Reverted the default model change in TogetherEmbedder back to m2-bert-80M-32k-retrieval to keep this PR strictly focused on the dimensions parameter bug. (I can open a separate quick PR for the deprecated model later).

Comprehensive Unit Tests Added:

I've added a robust test suite (test_send_dimensions.py) that explicitly verifies:

  1. Default auto-detection for standard OpenAI and custom base_urls.
  2. Explicit user overrides (send_dimensions=True/False).
  3. Subclass behaviors (proving TogetherEmbedder strips the parameter while FireworksEmbedder and LangDBEmbedder retain it).
  4. Mocked client tests to ensure the final payload sent to the API is accurately formatted.

@panchaldhruv27223
Copy link
Copy Markdown
Author

Any updates on this !?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] OpenAIEmbedder forces 'dimensions' parameter with custom base_urls, breaking third-party providers

2 participants