Developer Guide

Technical reference for the ArbiterAI library.

Architecture
Public API
Data Structures
Provider System
Utility Components
Usage Patterns
Configuration

1. Architecture

ArbiterAI follows a layered architecture:

┌──────────────────────────────────────────────────┐
│              Application Code                    │
└───────────────┬──────────────────────────────────┘
                │ creates
┌───────────────▼──────────────────────────────────┐
│  ArbiterAI (singleton / factory)                 │
│  - initialize(), createChatClient()              │
│  - Stateless convenience: completion(), etc.     │
└───────┬────────────────────┬─────────────────────┘
        │ creates             │ owns
┌───────▼────────┐  ┌────────▼─────────────────────┐
│  ChatClient    │  │  ModelManager (singleton)     │
│  (per session) │  │  - Config loading & schema    │
│  - History     │  │  - Model lookup               │
│  - Tools       │  │  - ConfigDownloader           │
│  - Stats       │  └──────────────────────────────┘
│  - Cache       │
└───────┬────────┘
        │ delegates to
┌───────▼────────────────────────────────────────┐
│  BaseProvider (strategy pattern)               │
│  ├─ OpenAI     ├─ DeepSeek    ├─ Mock          │
│  ├─ Anthropic  ├─ OpenRouter  └─ Llama (local) │
└────────────────────────────────────────────────┘

Core Components

ArbiterAI — Singleton that acts as a factory and lifecycle manager. Initializes providers, creates ChatClient instances, and provides stateless convenience methods.
ChatClient — Stateful, per-session interface managing conversation history, tool definitions, caching, and usage statistics. Created via ArbiterAI::createChatClient().
BaseProvider — Abstract interface for LLM backends. Concrete implementations handle provider-specific API formatting, authentication, and response parsing.
ModelManager — Singleton that loads and manages model configurations from JSON files with schema validation.
Utility Components — Cross-cutting functionality including caching (CacheManager), cost tracking (CostManager), model downloading (ModelDownloader), and file verification (FileVerifier).

Planned Components

See Local Model Management Task for upcoming additions:

HardwareDetector — GPU/RAM/CPU detection (NVML + Vulkan)
ModelRuntime — Multi-model loading, swap queueing, LRU eviction (refactor of LlamaInterface)
TelemetryCollector — Inference stats and system snapshots
Standalone Server — Separate arbiterAI-server application providing an OpenAI-compatible API, model management endpoints, and a live stats dashboard

2. Public API

`ArbiterAI` (defined in `arbiterAI.h`)

Method	Description
`static ArbiterAI &instance()`	Get the singleton instance
`ErrorCode initialize(const std::vector<path> &configPaths)`	Initialize the library with config directories
`std::shared_ptr<ChatClient> createChatClient(const ChatConfig &config)`	Create a stateful chat session
`std::shared_ptr<ChatClient> createChatClient(const std::string &model)`	Create a chat session with default config
`bool doesModelNeedApiKey(const std::string &model)`	Check if model requires API key
`bool supportModelDownload(const std::string &provider)`	Check if provider supports downloads
`ErrorCode getModelInfo(const std::string &modelName, ModelInfo &info)`	Get model information
`ErrorCode getAvailableModels(std::vector<std::string> &models)`	List available models
`ErrorCode completion(const CompletionRequest &request, CompletionResponse &response)`	Stateless completion (convenience)
`ErrorCode streamingCompletion(const CompletionRequest &request, callback)`	Stateless streaming completion
`std::vector<CompletionResponse> batchCompletion(const std::vector<CompletionRequest> &requests)`	Batch completion
`ErrorCode getEmbeddings(const EmbeddingRequest &request, EmbeddingResponse &response)`	Generate embeddings
`ErrorCode getDownloadStatus(const std::string &modelName, std::string &error)`	Get model download status
`ErrorCode shutdown()`	Clean up resources

`ChatClient` (defined in `chatClient.h`)

Completion:

Method	Description
`ErrorCode completion(const CompletionRequest &request, CompletionResponse &response)`	Blocking completion with session context
`ErrorCode streamingCompletion(const CompletionRequest &request, StreamCallback callback)`	Streaming completion

Conversation Management:

Method	Description
`ErrorCode addMessage(const Message &message)`	Add a message to history
`std::vector<Message> getHistory() const`	Get conversation history
`ErrorCode clearHistory()`	Clear history (re-adds system prompt)
`size_t getHistorySize() const`	Get message count

Tool/Function Calling:

Method	Description
`ErrorCode setTools(const std::vector<ToolDefinition> &tools)`	Set available tools
`std::vector<ToolDefinition> getTools() const`	Get configured tools
`ErrorCode clearTools()`	Clear all tools
`ErrorCode addToolResult(const std::string &toolCallId, const std::string &result)`	Add tool result to conversation

Configuration:

Method	Description
`ErrorCode setTemperature(double temperature)`	Set temperature (0.0-2.0)
`double getTemperature() const`	Get current temperature
`ErrorCode setMaxTokens(int maxTokens)`	Set max tokens
`int getMaxTokens() const`	Get max tokens
`std::string getModel() const`	Get model name
`std::string getProvider() const`	Get provider name

Status & Statistics:

Method	Description
`ErrorCode getDownloadStatus(DownloadProgress &progress)`	Get download progress (local models)
`ErrorCode getUsageStats(UsageStats &stats) const`	Get accumulated usage statistics
`int getCachedResponseCount() const`	Get cache hit count
`ErrorCode resetStats()`	Reset session statistics
`std::string getSessionId() const`	Get unique session ID

Session Lifecycle

Initialize ArbiterAI with configuration paths
Create a ChatClient via createChatClient()
Optionally set tools, temperature, etc.
Call completion() or streamingCompletion() — messages are automatically added to history
Query stats, history, or download status as needed
A new chat restart requires creating a new ChatClient instance

3. Data Structures

All data structures are defined in arbiterAI.h unless noted.

`ErrorCode`

enum class ErrorCode {
    Success, ApiKeyNotFound, UnknownModel, UnsupportedProvider,
    NetworkError, InvalidResponse, InvalidRequest, NotImplemented,
    GenerationError, ModelNotFound, ModelNotLoaded, ModelLoadError,
    ModelDownloading, ModelDownloadFailed
};

`Message`

struct Message {
    std::string role;       // "system", "user", "assistant", "tool"
    std::string content;
};

`ChatConfig` (defined in `chatClient.h`)

Field	Type	Description
`model`	`std::string`	Model identifier (required)
`temperature`	`std::optional<double>`	Sampling temperature
`maxTokens`	`std::optional<int>`	Max tokens per completion
`systemPrompt`	`std::optional<std::string>`	System message
`apiKey`	`std::optional<std::string>`	API key override
`enableCache`	`bool`	Enable session-level caching (default: `false`)
`cacheTTL`	`std::chrono::seconds`	Cache time-to-live (default: 3600)
`topP`	`std::optional<double>`	Top-p sampling parameter
`presencePenalty`	`std::optional<double>`	Presence penalty
`frequencyPenalty`	`std::optional<double>`	Frequency penalty

`CompletionRequest`

Field	Type	Description
`model`	`std::string`	Model identifier
`messages`	`std::vector<Message>`	Conversation messages
`temperature`	`std::optional<double>`	Sampling temperature
`max_tokens`	`std::optional<int>`	Maximum tokens
`api_key`	`std::optional<std::string>`	API key override
`provider`	`std::optional<std::string>`	Provider override
`top_p`	`std::optional<double>`	Top-p sampling
`presence_penalty`	`std::optional<double>`	Presence penalty
`frequency_penalty`	`std::optional<double>`	Frequency penalty
`stop`	`std::optional<std::vector<std::string>>`	Stop sequences
`tools`	`std::optional<std::vector<ToolDefinition>>`	Available tools
`tool_choice`	`std::optional<std::string>`	Tool selection mode

`CompletionResponse`

Field	Type	Description
`text`	`std::string`	Generated text
`model`	`std::string`	Model used
`usage`	`Usage`	Token usage statistics
`provider`	`std::string`	Provider used
`cost`	`double`	Estimated cost
`toolCalls`	`std::vector<ToolCall>`	Tool calls from model
`finishReason`	`std::string`	Reason completion finished
`fromCache`	`bool`	Whether served from cache

`Usage`

struct Usage {
    int prompt_tokens;
    int completion_tokens;
    int total_tokens;
};

`UsageStats`

Field	Type	Description
`promptTokens`	`int`	Total prompt tokens
`completionTokens`	`int`	Total completion tokens
`totalTokens`	`int`	Combined count
`estimatedCost`	`double`	Session cost estimate
`cachedResponses`	`int`	Cache hits
`completionCount`	`int`	Number of completions

Tool Structures

ToolParameter:

Field	Type	Description
`name`	`std::string`	Parameter name
`type`	`std::string`	Type (string, number, boolean, object, array)
`description`	`std::string`	Description for the LLM
`required`	`bool`	Whether required
`schema`	`nlohmann::json`	Full JSON schema for complex types

ToolDefinition:

Field	Type	Description
`name`	`std::string`	Tool name
`description`	`std::string`	Description for the LLM
`parameters`	`std::vector<ToolParameter>`	Parameter definitions
`parametersSchema`	`nlohmann::json`	Full JSON schema

ToolCall:

Field	Type	Description
`id`	`std::string`	Unique call identifier
`name`	`std::string`	Tool/function name
`arguments`	`nlohmann::json`	Arguments passed

Download Structures

DownloadStatus:

enum class DownloadStatus {
    NotApplicable, NotStarted, Pending, InProgress, Completed, Failed
};

DownloadProgress:

Field	Type	Description
`status`	`DownloadStatus`	Current status
`bytesDownloaded`	`int64_t`	Bytes downloaded
`totalBytes`	`int64_t`	Total file size
`percentComplete`	`float`	Percentage (0-100)
`errorMessage`	`std::string`	Error details if failed
`modelName`	`std::string`	Model being downloaded

Embedding Structures

EmbeddingRequest:

Field	Type	Description
`model`	`std::string`	Model identifier
`input`	`std::variant<std::string, std::vector<std::string>>`	Text to embed

EmbeddingResponse:

Field	Type	Description
`model`	`std::string`	Model used
`data`	`std::vector<Embedding>`	Embedding vectors with indices
`usage`	`Usage`	Token usage

Model Configuration (defined in `modelManager.h`)

ModelInfo:

Field	Type	Default	Description
`model`	`std::string`		Model identifier
`provider`	`std::string`		Provider type
`mode`	`std::string`	`"chat"`	Operation mode
`configVersion`	`std::string`	`"1.1.0"`	Schema version
`minSchemaVersion`	`std::string`	`"1.0.0"`	Minimum compatible version
`ranking`	`int`	`50`	Priority ranking (0-100)
`apiBase`	`std::optional<std::string>`		Custom API endpoint
`filePath`	`std::optional<std::string>`		Local model file path
`apiKey`	`std::optional<std::string>`		API key
`download`	`std::optional<DownloadMetadata>`		Download URL + SHA256
`contextWindow`	`int`	`4096`	Context window size
`maxTokens`	`int`	`2048`	Max tokens
`maxInputTokens`	`int`	`3072`	Max input tokens
`maxOutputTokens`	`int`	`1024`	Max output tokens
`pricing`	`Pricing`		Token costs

DownloadMetadata:

Field	Type	Description
`url`	`std::string`	Download URL
`sha256`	`std::string`	File hash for verification
`cachePath`	`std::string`	Local cache path

Pricing:

Field	Type	Description
`prompt_token_cost`	`double`	Cost per prompt token
`completion_token_cost`	`double`	Cost per completion token

4. Provider System

The provider system abstracts LLM backends using a strategy pattern.

`BaseProvider` (defined in `baseProvider.h`)

Method	Description
`virtual ErrorCode completion(request, model, response) = 0`	Text completion
`virtual ErrorCode streamingCompletion(request, callback) = 0`	Streaming completion
`virtual std::vector<CompletionResponse> batchCompletion(requests)`	Batch completion
`virtual ErrorCode getEmbeddings(request, response) = 0`	Generate embeddings
`virtual DownloadStatus getDownloadStatus(modelName, error)`	Legacy download status
`virtual ErrorCode getDownloadProgress(modelName, progress)`	Detailed download progress
`virtual ErrorCode getAvailableModels(models)`	List provider models
`std::string getProviderName() const`	Get provider name
`virtual void setApiUrl(const std::string &url)`	Set API endpoint
`virtual void setApiKey(const std::string &key)`	Set API key

Cloud Providers

OpenAI, Anthropic, DeepSeek, and OpenRouter implement BaseProvider by making HTTP requests to their respective APIs. They handle:

Authentication via API keys (environment variables or config)
Request/response format translation to/from unified structures
Streaming via Server-Sent Events (SSE)
Tool/function calling support

Local Provider

Llama delegates to LlamaInterface for local model inference via llama.cpp. Currently disabled in the build. See the Local Model Management Task for the planned refactor into ModelRuntime with multi-model and multi-GPU support.

Testing Provider

Mock provides deterministic responses via <echo> tags for testing. See the Testing Guide for details.

Adding a New Provider

Create src/arbiterAI/providers/newProvider.h/cpp
Inherit from BaseProvider and implement pure virtual methods
Register the provider name in ArbiterAI::createProvider()
Add model configurations to your config files
Add the provider to the schema's provider enum in schemas/model_config.schema.json

5. Utility Components

`CacheManager` (`cacheManager.h`)

TTL-based response caching:

Session-scope caching (per ChatClient when enableCache is set)
Global-scope caching (via ArbiterAI)
Configurable time-to-live
Cache key generation from request content

`CostManager` (`costManager.h`)

Spending tracking and limits:

Per-session and global spending limits
Callback when limits are reached
Cost state persistence across restarts
Per-model cost calculation based on Pricing

`ModelDownloader` (`modelDownloader.h`)

Async model downloading:

Progress callback support
Resume interrupted downloads
SHA256 file verification via FileVerifier
GitHub API integration for config downloads
Asynchronous downloading via std::future

`ConfigDownloader` (`configDownloader.h`)

Remote configuration fetching (skeleton — being fleshed out for the config repo integration):

Git-based clone/pull via libgit2
Version/tag pinning
Fallback to local cache

`FileVerifier` (`fileVerifier.h`)

SHA256 file verification:

Interface IFileVerifier for testability
FileVerifier implementation using PicoSHA2

`ModelManager` (`modelManager.h`)

Model configuration management:

JSON config loading with schema validation
Layered configuration (remote, local, override)
Model lookup by name or provider
Ranking-based model ordering

6. Usage Patterns

Basic Completion (via ChatClient)

#include "arbiterAI/arbiterAI.h"
#include "arbiterAI/chatClient.h"

auto &ai = arbiterAI::ArbiterAI::instance();
ai.initialize({"config/"});

arbiterAI::ChatConfig config;
config.model = "gpt-4";
config.temperature = 0.7;

auto client = ai.createChatClient(config);

arbiterAI::CompletionRequest request;
request.messages = {{"user", "Hello!"}};

arbiterAI::CompletionResponse response;
client->completion(request, response);
// response.text contains the reply
// Message is automatically added to history

Streaming

auto callback = [](const std::string &chunk, bool done) {
    std::cout << chunk;
    if (done) std::cout << std::endl;
};

client->streamingCompletion(request, callback);

Tool Calling

arbiterAI::ToolDefinition tool;
tool.name = "get_weather";
tool.description = "Get current weather";
tool.parameters = {{"location", "string", "City name", true, {}}};

client->setTools({tool});

// After completion, check for tool calls:
if (!response.toolCalls.empty())
{
    for (const auto &call : response.toolCalls)
    {
        std::string result = executeMyTool(call.name, call.arguments);
        client->addToolResult(call.id, result);
    }
    // Continue the conversation
    arbiterAI::CompletionRequest followUp;
    client->completion(followUp, response);
}

Stateless Convenience

arbiterAI::CompletionRequest request;
request.model = "gpt-4";
request.messages = {{"user", "Quick question"}};

arbiterAI::CompletionResponse response;
ai.completion(request, response);

7. Configuration

Model Configuration Files

Models are defined in JSON files validated against schemas/model_config.schema.json. See examples/model_config_v2.json for an example.

Environment Variables

Variable	Description
`OPENAI_API_KEY`	OpenAI API key
`ANTHROPIC_API_KEY`	Anthropic API key
`DEEPSEEK_API_KEY`	DeepSeek API key
`OPENROUTER_API_KEY`	OpenRouter API key

Configuration Precedence

Request-level — API key or provider override in CompletionRequest
Session-level — Settings in ChatConfig
File-level — Model config JSON files
Environment — Environment variables for API keys

FilesExpand file tree

developer.md

Latest commit

History

developer.md

File metadata and controls

Developer Guide

Table of Contents

1. Architecture

Core Components

Planned Components

2. Public API

ArbiterAI (defined in arbiterAI.h)

ChatClient (defined in chatClient.h)

Session Lifecycle

3. Data Structures

ErrorCode

Message

ChatConfig (defined in chatClient.h)

CompletionRequest

CompletionResponse

Usage

UsageStats