Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,12 @@ A C++17 library providing a unified interface for multiple LLM providers.

1. **All commands** must go through `./runDocker.sh ...`.
2. **All development** (building, testing, running) must be done inside the Docker container. The host environment is not guaranteed to have the correct tools or dependencies.
3. **Do not** use `python`, `pip`, `pytest` — the host may not have the correct Python version or dependencies.
4. **Do not** create or use a virtualenv on the host. The container is the virtualenv.
5. The project source is **bind-mounted** at `/app` inside the container. Edits to files on the host are immediately visible inside the container.
6. If you change the `Dockerfile`, run `./runDocker.sh --rebuild`.
7. Don't launch the server, ask the user to launch so that its not running in the agents terminal.
3. **Do not** run commands in the terminal with `2>&1` as the user cannot verify the command is running or not.
4. **Do not** use `python`, `pip`, `pytest` — the host may not have the correct Python version or dependencies.
5. **Do not** create or use a virtualenv on the host. The container is the virtualenv.
6. The project source is **bind-mounted** at `/app` inside the container. Edits to files on the host are immediately visible inside the container.
7. If you change the `Dockerfile`, run `./runDocker.sh --rebuild`.
8. Don't launch the server, ask the user to launch so that its not running in the agents terminal.

## Active Tasks

Expand Down
3 changes: 0 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,6 @@
build/
vcpkg_installed/

# Cloned config repository
arbiterAI_config/

# Generated at build time by CMake
**/generated/

Expand Down
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "arbiterAI_config"]
path = arbiterAI_config
url = https://github.com/caseymcc/arbiterAI_config.git
1 change: 1 addition & 0 deletions arbiterAI_config
Submodule arbiterAI_config added at cffe40
24 changes: 21 additions & 3 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
# syntax=docker/dockerfile:1
ARG DOCKER_VERSION=1.2.0
ARG DOCKER_VERSION=1.2.1
FROM ubuntu:24.04

# Install basic build tools, Python 3, and GPU libraries.
# Vulkan headers + glslc are needed at build time for llama.cpp's Vulkan backend.
# At runtime, GPU inference requires a Vulkan ICD on the host; without one
# llama.cpp falls back to CPU-only.
# ROCm HIP SDK is needed at build time for llama.cpp's HIP/ROCm backend.
# At runtime, GPU inference requires a Vulkan ICD or ROCm driver on the host;
# without one llama.cpp falls back to CPU-only.
RUN apt-get update && apt-get install -y \
build-essential \
cmake \
Expand Down Expand Up @@ -34,8 +35,25 @@ RUN apt-get update && apt-get install -y \
mesa-vulkan-drivers \
glslc \
glslang-tools \
wget \
&& rm -rf /var/lib/apt/lists/*

# Install ROCm HIP SDK (build-time only — no kernel driver needed in container)
# Use hiplibsdk usecase to get HIP development libraries and CMake configs
# (the plain "hip" usecase only installs runtime, missing hip-lang-config.cmake)
RUN wget -q https://repo.radeon.com/amdgpu-install/7.2.1/ubuntu/noble/amdgpu-install_7.2.1.70201-1_all.deb && \
apt-get update && \
apt-get install -y ./amdgpu-install_7.2.1.70201-1_all.deb && \
apt-get update && \
amdgpu-install -y --usecase=hiplibsdk --no-dkms && \
rm -f amdgpu-install_7.2.1.70201-1_all.deb && \
rm -rf /var/lib/apt/lists/*

ENV ROCM_PATH=/opt/rocm
ENV HIP_PATH=/opt/rocm
ENV PATH="${ROCM_PATH}/bin:${PATH}"
ENV CMAKE_PREFIX_PATH="${ROCM_PATH}:${CMAKE_PREFIX_PATH}"

RUN curl -fsSL https://deb.nodesource.com/setup_18.x | bash - && \
apt-get install -y nodejs && \
rm -rf /var/lib/apt/lists/*
Expand Down
4 changes: 3 additions & 1 deletion docs/server.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,8 @@ All server settings are defined in a JSON configuration file. See [`examples/ser
"hardware": {
"vram_overrides": {
"0": 32000
}
},
"default_backend_priority": ["vulkan"]
},
"logging": {
"level": "info",
Expand Down Expand Up @@ -116,6 +117,7 @@ All server settings are defined in a JSON configuration file. See [`examples/ser
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `vram_overrides` | `object` | `{}` | GPU index → VRAM MB overrides (e.g., `{"0": 32000}`) |
| `default_backend_priority` | `string[]` | `[]` | Default GPU backend preference for models without their own `backend_priority` (e.g., `["vulkan"]`). Empty = all backends. |

**`logging` object:**

Expand Down
3 changes: 2 additions & 1 deletion examples/server_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@
},

"hardware": {
"vram_overrides": {}
"vram_overrides": {},
"default_backend_priority": []
},

"logging": {
Expand Down
52 changes: 52 additions & 0 deletions schemas/model_config.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,58 @@
}
}
}
},
"runtime_options": {
"type": "object",
"description": "Runtime options for llama.cpp model loading and inference. Applied as defaults; can be overridden at load time via the API.",
"properties": {
"flash_attn": {
"type": "boolean",
"description": "Enable or disable flash attention (-fa)"
},
"kv_cache_type_k": {
"type": "string",
"description": "KV cache data type for keys (-ctk)",
"enum": ["f32", "f16", "bf16", "q8_0", "q4_0", "q4_1", "q5_0", "q5_1"]
},
"kv_cache_type_v": {
"type": "string",
"description": "KV cache data type for values (-ctv)",
"enum": ["f32", "f16", "bf16", "q8_0", "q4_0", "q4_1", "q5_0", "q5_1"]
},
"no_mmap": {
"type": "boolean",
"description": "Disable memory-mapped file I/O (--no-mmap)"
},
"reasoning_budget": {
"type": "integer",
"description": "Reasoning token budget (--reasoning-budget). 0 disables reasoning tokens.",
"minimum": 0
},
"swa_full": {
"type": "boolean",
"description": "Use full-size sliding window attention cache (--swa-full)"
},
"n_gpu_layers": {
"type": "integer",
"description": "Number of layers to offload to GPU (-ngl). 99 offloads all.",
"minimum": 0
},
"override_tensor": {
"type": "string",
"description": "Tensor override pattern (-ot) for routing tensors to CPU/GPU"
}
},
"additionalProperties": false
},
"backend_priority": {
"type": "array",
"description": "Ordered preference for GPU backends. First available backend is used.",
"items": {
"type": "string",
"enum": ["vulkan", "rocm", "cuda"]
},
"uniqueItems": true
}
}
}
Expand Down
10 changes: 8 additions & 2 deletions src/arbiterAI/arbiterAI.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -532,9 +532,15 @@ ErrorCode ArbiterAI::getAvailableModels(std::vector<std::string>& models)

// ========== Local Model Management ==========

ErrorCode ArbiterAI::loadModel(const std::string &model, const std::string &variant, int contextSize)
ErrorCode ArbiterAI::loadModel(const std::string &model, const std::string &variant, int contextSize,
const RuntimeOptions *optionsOverride)
{
return ModelRuntime::instance().loadModel(model, variant, contextSize);
RuntimeOptions opts;
if(optionsOverride)
{
opts=*optionsOverride;
}
return ModelRuntime::instance().loadModel(model, variant, contextSize, opts);
}

ErrorCode ArbiterAI::downloadModel(const std::string &model, const std::string &variant)
Expand Down
43 changes: 41 additions & 2 deletions src/arbiterAI/arbiterAI.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ struct ModelFit;
struct LoadedModel;
struct SystemSnapshot;
struct InferenceStats;
struct RuntimeOptions;

/**
* @struct VersionInfo
Expand Down Expand Up @@ -241,6 +242,42 @@ inline void from_json(const nlohmann::json &j, ToolCall &t)
* - "assistant": may include tool_calls when the model invokes tools
* - "tool": includes tool_call_id linking the result back to a specific tool call
*/

/// Extract text from an OpenAI `content` field.
/// The spec allows content as either a plain string or an array of content
/// parts (e.g. [{"type":"text","text":"..."},{"type":"image_url",...}]).
/// This helper concatenates all "text" parts and ignores non-text entries.
inline std::string contentToString(const nlohmann::json &contentJson)
{
if(contentJson.is_string())
return contentJson.get<std::string>();

if(contentJson.is_array())
{
std::string result;
for(const nlohmann::json &part:contentJson)
{
if(part.is_string())
{
if(!result.empty()) result+=' ';
result+=part.get<std::string>();
}
else if(part.is_object()
&& part.contains("type")
&& part.at("type").get<std::string>()=="text"
&& part.contains("text"))
{
if(!result.empty()) result+=' ';
result+=part.at("text").get<std::string>();
}
// Skip non-text parts (image_url, etc.)
}
return result;
}

return {};
}

struct Message
{
std::string role;
Expand All @@ -262,7 +299,7 @@ inline void from_json(const nlohmann::json &j, Message &m)
{
j.at("role").get_to(m.role);
if(j.contains("content") && !j.at("content").is_null())
j.at("content").get_to(m.content);
m.content=contentToString(j.at("content"));
if(j.contains("tool_call_id"))
m.toolCallId=j.at("tool_call_id").get<std::string>();
if(j.contains("tool_calls"))
Expand Down Expand Up @@ -605,9 +642,11 @@ class ArbiterAI
* @param model Model name
* @param variant Quantization variant (empty = auto-select)
* @param contextSize Context size (0 = model default)
* @param optionsOverride Optional runtime options to merge on top of model config defaults (nullptr = use config defaults)
* @return ErrorCode indicating success, ModelDownloading, or failure
*/
ErrorCode loadModel(const std::string &model, const std::string &variant="", int contextSize=0);
ErrorCode loadModel(const std::string &model, const std::string &variant="", int contextSize=0,
const RuntimeOptions *optionsOverride=nullptr);

/**
* @brief Download model files without loading into VRAM
Expand Down
16 changes: 10 additions & 6 deletions src/arbiterAI/configDownloader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -195,20 +195,24 @@ ConfigDownloadStatus ConfigDownloader::checkoutVersion()

spdlog::info("Checking out version: {}", m_version);

// Try to resolve the version as a direct ref, remote branch, or tag
// Try to resolve the version — prefer remote branch refs first so that
// a fetch+checkout always picks up the latest remote commit rather than
// a stale local branch ref that was never fast-forwarded.
git_object *obj=nullptr;
error=git_revparse_single(&obj, repo, m_version.c_str());

// 1. Try as a remote-tracking branch (most common path after fetch)
std::string remoteBranch="refs/remotes/origin/"+m_version;
error=git_revparse_single(&obj, repo, remoteBranch.c_str());

if(error!=0)
{
// Try as a remote branch
std::string remoteBranch="refs/remotes/origin/"+m_version;
error=git_revparse_single(&obj, repo, remoteBranch.c_str());
// 2. Try as a direct ref / local branch / SHA
error=git_revparse_single(&obj, repo, m_version.c_str());
}

if(error!=0)
{
// Try as a tag
// 3. Try as a tag
std::string tag="refs/tags/"+m_version;
error=git_revparse_single(&obj, repo, tag.c_str());
}
Expand Down
Loading
Loading