Technical reference for the ArbiterAI library.
- Architecture
- Public API
- Data Structures
- Provider System
- Utility Components
- Usage Patterns
- Configuration
ArbiterAI follows a layered architecture:
┌──────────────────────────────────────────────────┐
│ Application Code │
└───────────────┬──────────────────────────────────┘
│ creates
┌───────────────▼──────────────────────────────────┐
│ ArbiterAI (singleton / factory) │
│ - initialize(), createChatClient() │
│ - Stateless convenience: completion(), etc. │
└───────┬────────────────────┬─────────────────────┘
│ creates │ owns
┌───────▼────────┐ ┌────────▼─────────────────────┐
│ ChatClient │ │ ModelManager (singleton) │
│ (per session) │ │ - Config loading & schema │
│ - History │ │ - Model lookup │
│ - Tools │ │ - ConfigDownloader │
│ - Stats │ └──────────────────────────────┘
│ - Cache │
└───────┬────────┘
│ delegates to
┌───────▼────────────────────────────────────────┐
│ BaseProvider (strategy pattern) │
│ ├─ OpenAI ├─ DeepSeek ├─ Mock │
│ ├─ Anthropic ├─ OpenRouter └─ Llama (local) │
└────────────────────────────────────────────────┘
ArbiterAI— Singleton that acts as a factory and lifecycle manager. Initializes providers, createsChatClientinstances, and provides stateless convenience methods.ChatClient— Stateful, per-session interface managing conversation history, tool definitions, caching, and usage statistics. Created viaArbiterAI::createChatClient().BaseProvider— Abstract interface for LLM backends. Concrete implementations handle provider-specific API formatting, authentication, and response parsing.ModelManager— Singleton that loads and manages model configurations from JSON files with schema validation.- Utility Components — Cross-cutting functionality including caching (
CacheManager), cost tracking (CostManager), model downloading (ModelDownloader), and file verification (FileVerifier).
See Local Model Management Task for upcoming additions:
HardwareDetector— GPU/RAM/CPU detection (NVML + Vulkan)ModelRuntime— Multi-model loading, swap queueing, LRU eviction (refactor ofLlamaInterface)TelemetryCollector— Inference stats and system snapshots- Standalone Server — Separate
arbiterAI-serverapplication providing an OpenAI-compatible API, model management endpoints, and a live stats dashboard
ArbiterAI (defined in arbiterAI.h)
| Method | Description |
|---|---|
static ArbiterAI &instance() |
Get the singleton instance |
ErrorCode initialize(const std::vector<path> &configPaths) |
Initialize the library with config directories |
std::shared_ptr<ChatClient> createChatClient(const ChatConfig &config) |
Create a stateful chat session |
std::shared_ptr<ChatClient> createChatClient(const std::string &model) |
Create a chat session with default config |
bool doesModelNeedApiKey(const std::string &model) |
Check if model requires API key |
bool supportModelDownload(const std::string &provider) |
Check if provider supports downloads |
ErrorCode getModelInfo(const std::string &modelName, ModelInfo &info) |
Get model information |
ErrorCode getAvailableModels(std::vector<std::string> &models) |
List available models |
ErrorCode completion(const CompletionRequest &request, CompletionResponse &response) |
Stateless completion (convenience) |
ErrorCode streamingCompletion(const CompletionRequest &request, callback) |
Stateless streaming completion |
std::vector<CompletionResponse> batchCompletion(const std::vector<CompletionRequest> &requests) |
Batch completion |
ErrorCode getEmbeddings(const EmbeddingRequest &request, EmbeddingResponse &response) |
Generate embeddings |
ErrorCode getDownloadStatus(const std::string &modelName, std::string &error) |
Get model download status |
ErrorCode shutdown() |
Clean up resources |
ChatClient (defined in chatClient.h)
Completion:
| Method | Description |
|---|---|
ErrorCode completion(const CompletionRequest &request, CompletionResponse &response) |
Blocking completion with session context |
ErrorCode streamingCompletion(const CompletionRequest &request, StreamCallback callback) |
Streaming completion |
Conversation Management:
| Method | Description |
|---|---|
ErrorCode addMessage(const Message &message) |
Add a message to history |
std::vector<Message> getHistory() const |
Get conversation history |
ErrorCode clearHistory() |
Clear history (re-adds system prompt) |
size_t getHistorySize() const |
Get message count |
Tool/Function Calling:
| Method | Description |
|---|---|
ErrorCode setTools(const std::vector<ToolDefinition> &tools) |
Set available tools |
std::vector<ToolDefinition> getTools() const |
Get configured tools |
ErrorCode clearTools() |
Clear all tools |
ErrorCode addToolResult(const std::string &toolCallId, const std::string &result) |
Add tool result to conversation |
Configuration:
| Method | Description |
|---|---|
ErrorCode setTemperature(double temperature) |
Set temperature (0.0-2.0) |
double getTemperature() const |
Get current temperature |
ErrorCode setMaxTokens(int maxTokens) |
Set max tokens |
int getMaxTokens() const |
Get max tokens |
std::string getModel() const |
Get model name |
std::string getProvider() const |
Get provider name |
Status & Statistics:
| Method | Description |
|---|---|
ErrorCode getDownloadStatus(DownloadProgress &progress) |
Get download progress (local models) |
ErrorCode getUsageStats(UsageStats &stats) const |
Get accumulated usage statistics |
int getCachedResponseCount() const |
Get cache hit count |
ErrorCode resetStats() |
Reset session statistics |
std::string getSessionId() const |
Get unique session ID |
- Initialize
ArbiterAIwith configuration paths - Create a
ChatClientviacreateChatClient() - Optionally set tools, temperature, etc.
- Call
completion()orstreamingCompletion()— messages are automatically added to history - Query stats, history, or download status as needed
- A new chat restart requires creating a new
ChatClientinstance
All data structures are defined in arbiterAI.h unless noted.
enum class ErrorCode {
Success, ApiKeyNotFound, UnknownModel, UnsupportedProvider,
NetworkError, InvalidResponse, InvalidRequest, NotImplemented,
GenerationError, ModelNotFound, ModelNotLoaded, ModelLoadError,
ModelDownloading, ModelDownloadFailed
};struct Message {
std::string role; // "system", "user", "assistant", "tool"
std::string content;
};ChatConfig (defined in chatClient.h)
| Field | Type | Description |
|---|---|---|
model |
std::string |
Model identifier (required) |
temperature |
std::optional<double> |
Sampling temperature |
maxTokens |
std::optional<int> |
Max tokens per completion |
systemPrompt |
std::optional<std::string> |
System message |
apiKey |
std::optional<std::string> |
API key override |
enableCache |
bool |
Enable session-level caching (default: false) |
cacheTTL |
std::chrono::seconds |
Cache time-to-live (default: 3600) |
topP |
std::optional<double> |
Top-p sampling parameter |
presencePenalty |
std::optional<double> |
Presence penalty |
frequencyPenalty |
std::optional<double> |
Frequency penalty |
| Field | Type | Description |
|---|---|---|
model |
std::string |
Model identifier |
messages |
std::vector<Message> |
Conversation messages |
temperature |
std::optional<double> |
Sampling temperature |
max_tokens |
std::optional<int> |
Maximum tokens |
api_key |
std::optional<std::string> |
API key override |
provider |
std::optional<std::string> |
Provider override |
top_p |
std::optional<double> |
Top-p sampling |
presence_penalty |
std::optional<double> |
Presence penalty |
frequency_penalty |
std::optional<double> |
Frequency penalty |
stop |
std::optional<std::vector<std::string>> |
Stop sequences |
tools |
std::optional<std::vector<ToolDefinition>> |
Available tools |
tool_choice |
std::optional<std::string> |
Tool selection mode |
| Field | Type | Description |
|---|---|---|
text |
std::string |
Generated text |
model |
std::string |
Model used |
usage |
Usage |
Token usage statistics |
provider |
std::string |
Provider used |
cost |
double |
Estimated cost |
toolCalls |
std::vector<ToolCall> |
Tool calls from model |
finishReason |
std::string |
Reason completion finished |
fromCache |
bool |
Whether served from cache |
struct Usage {
int prompt_tokens;
int completion_tokens;
int total_tokens;
};| Field | Type | Description |
|---|---|---|
promptTokens |
int |
Total prompt tokens |
completionTokens |
int |
Total completion tokens |
totalTokens |
int |
Combined count |
estimatedCost |
double |
Session cost estimate |
cachedResponses |
int |
Cache hits |
completionCount |
int |
Number of completions |
ToolParameter:
| Field | Type | Description |
|---|---|---|
name |
std::string |
Parameter name |
type |
std::string |
Type (string, number, boolean, object, array) |
description |
std::string |
Description for the LLM |
required |
bool |
Whether required |
schema |
nlohmann::json |
Full JSON schema for complex types |
ToolDefinition:
| Field | Type | Description |
|---|---|---|
name |
std::string |
Tool name |
description |
std::string |
Description for the LLM |
parameters |
std::vector<ToolParameter> |
Parameter definitions |
parametersSchema |
nlohmann::json |
Full JSON schema |
ToolCall:
| Field | Type | Description |
|---|---|---|
id |
std::string |
Unique call identifier |
name |
std::string |
Tool/function name |
arguments |
nlohmann::json |
Arguments passed |
DownloadStatus:
enum class DownloadStatus {
NotApplicable, NotStarted, Pending, InProgress, Completed, Failed
};DownloadProgress:
| Field | Type | Description |
|---|---|---|
status |
DownloadStatus |
Current status |
bytesDownloaded |
int64_t |
Bytes downloaded |
totalBytes |
int64_t |
Total file size |
percentComplete |
float |
Percentage (0-100) |
errorMessage |
std::string |
Error details if failed |
modelName |
std::string |
Model being downloaded |
EmbeddingRequest:
| Field | Type | Description |
|---|---|---|
model |
std::string |
Model identifier |
input |
std::variant<std::string, std::vector<std::string>> |
Text to embed |
EmbeddingResponse:
| Field | Type | Description |
|---|---|---|
model |
std::string |
Model used |
data |
std::vector<Embedding> |
Embedding vectors with indices |
usage |
Usage |
Token usage |
Model Configuration (defined in modelManager.h)
ModelInfo:
| Field | Type | Default | Description |
|---|---|---|---|
model |
std::string |
Model identifier | |
provider |
std::string |
Provider type | |
mode |
std::string |
"chat" |
Operation mode |
configVersion |
std::string |
"1.1.0" |
Schema version |
minSchemaVersion |
std::string |
"1.0.0" |
Minimum compatible version |
ranking |
int |
50 |
Priority ranking (0-100) |
apiBase |
std::optional<std::string> |
Custom API endpoint | |
filePath |
std::optional<std::string> |
Local model file path | |
apiKey |
std::optional<std::string> |
API key | |
download |
std::optional<DownloadMetadata> |
Download URL + SHA256 | |
contextWindow |
int |
4096 |
Context window size |
maxTokens |
int |
2048 |
Max tokens |
maxInputTokens |
int |
3072 |
Max input tokens |
maxOutputTokens |
int |
1024 |
Max output tokens |
pricing |
Pricing |
Token costs |
DownloadMetadata:
| Field | Type | Description |
|---|---|---|
url |
std::string |
Download URL |
sha256 |
std::string |
File hash for verification |
cachePath |
std::string |
Local cache path |
Pricing:
| Field | Type | Description |
|---|---|---|
prompt_token_cost |
double |
Cost per prompt token |
completion_token_cost |
double |
Cost per completion token |
The provider system abstracts LLM backends using a strategy pattern.
BaseProvider (defined in baseProvider.h)
| Method | Description |
|---|---|
virtual ErrorCode completion(request, model, response) = 0 |
Text completion |
virtual ErrorCode streamingCompletion(request, callback) = 0 |
Streaming completion |
virtual std::vector<CompletionResponse> batchCompletion(requests) |
Batch completion |
virtual ErrorCode getEmbeddings(request, response) = 0 |
Generate embeddings |
virtual DownloadStatus getDownloadStatus(modelName, error) |
Legacy download status |
virtual ErrorCode getDownloadProgress(modelName, progress) |
Detailed download progress |
virtual ErrorCode getAvailableModels(models) |
List provider models |
std::string getProviderName() const |
Get provider name |
virtual void setApiUrl(const std::string &url) |
Set API endpoint |
virtual void setApiKey(const std::string &key) |
Set API key |
OpenAI, Anthropic, DeepSeek, and OpenRouter implement BaseProvider by making HTTP requests to their respective APIs. They handle:
- Authentication via API keys (environment variables or config)
- Request/response format translation to/from unified structures
- Streaming via Server-Sent Events (SSE)
- Tool/function calling support
Llama delegates to LlamaInterface for local model inference via llama.cpp. Currently disabled in the build. See the Local Model Management Task for the planned refactor into ModelRuntime with multi-model and multi-GPU support.
Mock provides deterministic responses via <echo> tags for testing. See the Testing Guide for details.
- Create
src/arbiterAI/providers/newProvider.h/cpp - Inherit from
BaseProviderand implement pure virtual methods - Register the provider name in
ArbiterAI::createProvider() - Add model configurations to your config files
- Add the provider to the schema's provider enum in
schemas/model_config.schema.json
CacheManager (cacheManager.h)
TTL-based response caching:
- Session-scope caching (per
ChatClientwhenenableCacheis set) - Global-scope caching (via
ArbiterAI) - Configurable time-to-live
- Cache key generation from request content
CostManager (costManager.h)
Spending tracking and limits:
- Per-session and global spending limits
- Callback when limits are reached
- Cost state persistence across restarts
- Per-model cost calculation based on
Pricing
ModelDownloader (modelDownloader.h)
Async model downloading:
- Progress callback support
- Resume interrupted downloads
- SHA256 file verification via
FileVerifier - GitHub API integration for config downloads
- Asynchronous downloading via
std::future
ConfigDownloader (configDownloader.h)
Remote configuration fetching (skeleton — being fleshed out for the config repo integration):
- Git-based clone/pull via libgit2
- Version/tag pinning
- Fallback to local cache
FileVerifier (fileVerifier.h)
SHA256 file verification:
- Interface
IFileVerifierfor testability FileVerifierimplementation using PicoSHA2
ModelManager (modelManager.h)
Model configuration management:
- JSON config loading with schema validation
- Layered configuration (remote, local, override)
- Model lookup by name or provider
- Ranking-based model ordering
#include "arbiterAI/arbiterAI.h"
#include "arbiterAI/chatClient.h"
auto &ai = arbiterAI::ArbiterAI::instance();
ai.initialize({"config/"});
arbiterAI::ChatConfig config;
config.model = "gpt-4";
config.temperature = 0.7;
auto client = ai.createChatClient(config);
arbiterAI::CompletionRequest request;
request.messages = {{"user", "Hello!"}};
arbiterAI::CompletionResponse response;
client->completion(request, response);
// response.text contains the reply
// Message is automatically added to historyauto callback = [](const std::string &chunk, bool done) {
std::cout << chunk;
if (done) std::cout << std::endl;
};
client->streamingCompletion(request, callback);arbiterAI::ToolDefinition tool;
tool.name = "get_weather";
tool.description = "Get current weather";
tool.parameters = {{"location", "string", "City name", true, {}}};
client->setTools({tool});
// After completion, check for tool calls:
if (!response.toolCalls.empty())
{
for (const auto &call : response.toolCalls)
{
std::string result = executeMyTool(call.name, call.arguments);
client->addToolResult(call.id, result);
}
// Continue the conversation
arbiterAI::CompletionRequest followUp;
client->completion(followUp, response);
}arbiterAI::CompletionRequest request;
request.model = "gpt-4";
request.messages = {{"user", "Quick question"}};
arbiterAI::CompletionResponse response;
ai.completion(request, response);Models are defined in JSON files validated against schemas/model_config.schema.json. See examples/model_config_v2.json for an example.
| Variable | Description |
|---|---|
OPENAI_API_KEY |
OpenAI API key |
ANTHROPIC_API_KEY |
Anthropic API key |
DEEPSEEK_API_KEY |
DeepSeek API key |
OPENROUTER_API_KEY |
OpenRouter API key |
- Request-level — API key or provider override in
CompletionRequest - Session-level — Settings in
ChatConfig - File-level — Model config JSON files
- Environment — Environment variables for API keys
- Project Overview — Goals, features, and third-party libraries
- Testing Guide — Mock provider and testing strategies
- Development Process — Build instructions and project structure
- Local Model Management — Planned llama.cpp expansion and standalone server